Paper Group ANR 312
Deep Linear Discriminant Analysis on Fisher Networks: A Hybrid Architecture for Person Re-identification. Chained Predictions Using Convolutional Neural Networks. Regularizing Solutions to the MEG Inverse Problem Using Space-Time Separable Covariance Functions. Category Theoretic Analysis of Photon-based Decision Making. An equivalence between high …
Deep Linear Discriminant Analysis on Fisher Networks: A Hybrid Architecture for Person Re-identification
Title | Deep Linear Discriminant Analysis on Fisher Networks: A Hybrid Architecture for Person Re-identification |
Authors | Lin Wu, Chunhua Shen, Anton van den Hengel |
Abstract | Person re-identification is to seek a correct match for a person of interest across views among a large number of imposters. It typically involves two procedures of non-linear feature extractions against dramatic appearance changes, and subsequent discriminative analysis in order to reduce intra- personal variations while enlarging inter-personal differences. In this paper, we introduce a hybrid architecture which combines Fisher vectors and deep neural networks to learn non-linear representations of person images to a space where data can be linearly separable. We reinforce a Linear Discriminant Analysis (LDA) on top of the deep neural network such that linearly separable latent representations can be learnt in an end-to-end fashion. By optimizing an objective function modified from LDA, the network is enforced to produce feature distributions which have a low variance within the same class and high variance between classes. The objective is essentially derived from the general LDA eigenvalue problem and allows to train the network with stochastic gradient descent and back-propagate LDA gradients to compute the gradients involved in Fisher vector encoding. For evaluation we test our approach on four benchmark data sets in person re-identification (VIPeR [1], CUHK03 [2], CUHK01 [3], and Market1501 [4]). Extensive experiments on these benchmarks show that our model can achieve state-of-the-art results. |
Tasks | Person Re-Identification |
Published | 2016-06-06 |
URL | http://arxiv.org/abs/1606.01595v1 |
http://arxiv.org/pdf/1606.01595v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-linear-discriminant-analysis-on-fisher |
Repo | |
Framework | |
Chained Predictions Using Convolutional Neural Networks
Title | Chained Predictions Using Convolutional Neural Networks |
Authors | Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly |
Abstract | In this paper, we present an adaptation of the sequence-to-sequence model for structured output prediction in vision tasks. In this model the output variables for a given input are predicted sequentially using neural networks. The prediction for each output variable depends not only on the input but also on the previously predicted output variables. The model is applied to spatial localization tasks and uses convolutional neural networks (CNNs) for processing input images and a multi-scale deconvolutional architecture for making spatial predictions at each time step. We explore the impact of weight sharing with a recurrent connection matrix between consecutive predictions, and compare it to a formulation where these weights are not tied. Untied weights are particularly suited for problems with a fixed sized structure, where different classes of output are predicted in different steps. We show that chained predictions achieve top performing results on human pose estimation from single images and videos. |
Tasks | Pose Estimation |
Published | 2016-05-08 |
URL | http://arxiv.org/abs/1605.02346v2 |
http://arxiv.org/pdf/1605.02346v2.pdf | |
PWC | https://paperswithcode.com/paper/chained-predictions-using-convolutional |
Repo | |
Framework | |
Regularizing Solutions to the MEG Inverse Problem Using Space-Time Separable Covariance Functions
Title | Regularizing Solutions to the MEG Inverse Problem Using Space-Time Separable Covariance Functions |
Authors | Arno Solin, Pasi Jylänki, Jaakko Kauramäki, Tom Heskes, Marcel A. J. van Gerven, Simo Särkkä |
Abstract | In magnetoencephalography (MEG) the conventional approach to source reconstruction is to solve the underdetermined inverse problem independently over time and space. Here we present how the conventional approach can be extended by regularizing the solution in space and time by a Gaussian process (Gaussian random field) model. Assuming a separable covariance function in space and time, the computational complexity of the proposed model becomes (without any further assumptions or restrictions) $\mathcal{O}(t^3 + n^3 + m^2n)$, where $t$ is the number of time steps, $m$ is the number of sources, and $n$ is the number of sensors. We apply the method to both simulated and empirical data, and demonstrate the efficiency and generality of our Bayesian source reconstruction approach which subsumes various classical approaches in the literature. |
Tasks | |
Published | 2016-04-17 |
URL | http://arxiv.org/abs/1604.04931v1 |
http://arxiv.org/pdf/1604.04931v1.pdf | |
PWC | https://paperswithcode.com/paper/regularizing-solutions-to-the-meg-inverse |
Repo | |
Framework | |
Category Theoretic Analysis of Photon-based Decision Making
Title | Category Theoretic Analysis of Photon-based Decision Making |
Authors | Makoto Naruse, Song-Ju Kim, Masashi Aono, Martin Berthel, Aurélien Drezet, Serge Huant, Hirokazu Hori |
Abstract | Decision making is a vital function in this age of machine learning and artificial intelligence, yet its physical realization and theoretical fundamentals are still not completely understood. In our former study, we demonstrated that single-photons can be used to make decisions in uncertain, dynamically changing environments. The two-armed bandit problem was successfully solved using the dual probabilistic and particle attributes of single photons. In this study, we present a category theoretic modeling and analysis of single-photon-based decision making, including a quantitative analysis that is in agreement with the experimental results. A category theoretic model reveals the complex interdependencies of subject matter entities in a simplified manner, even in dynamically changing environments. In particular, the octahedral and braid structures in triangulated categories provide a better understanding and quantitative metrics of the underlying mechanisms of a single-photon decision maker. This study provides both insight and a foundation for analyzing more complex and uncertain problems, to further machine learning and artificial intelligence. |
Tasks | Decision Making |
Published | 2016-02-26 |
URL | http://arxiv.org/abs/1602.08199v3 |
http://arxiv.org/pdf/1602.08199v3.pdf | |
PWC | https://paperswithcode.com/paper/category-theoretic-analysis-of-photon-based |
Repo | |
Framework | |
An equivalence between high dimensional Bayes optimal inference and M-estimation
Title | An equivalence between high dimensional Bayes optimal inference and M-estimation |
Authors | Madhu Advani, Surya Ganguli |
Abstract | When recovering an unknown signal from noisy measurements, the computational difficulty of performing optimal Bayesian MMSE (minimum mean squared error) inference often necessitates the use of maximum a posteriori (MAP) inference, a special case of regularized M-estimation, as a surrogate. However, MAP is suboptimal in high dimensions, when the number of unknown signal components is similar to the number of measurements. In this work we demonstrate, when the signal distribution and the likelihood function associated with the noise are both log-concave, that optimal MMSE performance is asymptotically achievable via another M-estimation procedure. This procedure involves minimizing convex loss and regularizer functions that are nonlinearly smoothed versions of the widely applied MAP optimization problem. Our findings provide a new heuristic derivation and interpretation for recent optimal M-estimators found in the setting of linear measurements and additive noise, and further extend these results to nonlinear measurements with non-additive noise. We numerically demonstrate superior performance of our optimal M-estimators relative to MAP. Overall, at the heart of our work is the revelation of a remarkable equivalence between two seemingly very different computational problems: namely that of high dimensional Bayesian integration underlying MMSE inference, and high dimensional convex optimization underlying M-estimation. In essence we show that the former difficult integral may be computed by solving the latter, simpler optimization problem. |
Tasks | |
Published | 2016-09-22 |
URL | http://arxiv.org/abs/1609.07060v1 |
http://arxiv.org/pdf/1609.07060v1.pdf | |
PWC | https://paperswithcode.com/paper/an-equivalence-between-high-dimensional-bayes |
Repo | |
Framework | |
Deep Stereo Matching with Dense CRF Priors
Title | Deep Stereo Matching with Dense CRF Priors |
Authors | Ron Slossberg, Aaron Wetzler, Ron Kimmel |
Abstract | Stereo reconstruction from rectified images has recently been revisited within the context of deep learning. Using a deep Convolutional Neural Network to obtain patch-wise matching cost volumes has resulted in state of the art stereo reconstruction on classic datasets like Middlebury and Kitti. By introducing this cost into a classical stereo pipeline, the final results are improved dramatically over non-learning based cost models. However these pipelines typically include hand engineered post processing steps to effectively regularize and clean the result. Here, we show that it is possible to take a more holistic approach by training a fully end-to-end network which directly includes regularization in the form of a densely connected Conditional Random Field (CRF) that acts as a prior on inter-pixel interactions. We demonstrate that our approach on both synthetic and real world datasets outperforms an alternative end-to-end network and compares favorably to more hand engineered approaches. |
Tasks | Stereo Matching, Stereo Matching Hand |
Published | 2016-12-06 |
URL | http://arxiv.org/abs/1612.01725v2 |
http://arxiv.org/pdf/1612.01725v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-stereo-matching-with-dense-crf-priors |
Repo | |
Framework | |
Natural Language Semantics and Computability
Title | Natural Language Semantics and Computability |
Authors | Richard Moot, Christian Retoré |
Abstract | This paper is a reflexion on the computability of natural language semantics. It does not contain a new model or new results in the formal semantics of natural language: it is rather a computational analysis of the logical models and algorithms currently used in natural language semantics, defined as the mapping of a statement to logical formulas - formulas, because a statement can be ambiguous. We argue that as long as possible world semantics is left out, one can compute the semantic representation(s) of a given statement, including aspects of lexical meaning. We also discuss the algorithmic complexity of this process. |
Tasks | |
Published | 2016-05-13 |
URL | http://arxiv.org/abs/1605.04122v1 |
http://arxiv.org/pdf/1605.04122v1.pdf | |
PWC | https://paperswithcode.com/paper/natural-language-semantics-and-computability |
Repo | |
Framework | |
PCM and APCM Revisited: An Uncertainty Perspective
Title | PCM and APCM Revisited: An Uncertainty Perspective |
Authors | Peixin Hou, Hao Deng, Jiguang Yue, Shuguang Liu |
Abstract | In this paper, we take a new look at the possibilistic c-means (PCM) and adaptive PCM (APCM) clustering algorithms from the perspective of uncertainty. This new perspective offers us insights into the clustering process, and also provides us greater degree of flexibility. We analyze the clustering behavior of PCM-based algorithms and introduce parameters $\sigma_v$ and $\alpha$ to characterize uncertainty of estimated bandwidth and noise level of the dataset respectively. Then uncertainty (fuzziness) of membership values caused by uncertainty of the estimated bandwidth parameter is modeled by a conditional fuzzy set, which is a new formulation of the type-2 fuzzy set. Experiments show that parameters $\sigma_v$ and $\alpha$ make the clustering process more easy to control, and main features of PCM and APCM are unified in this new clustering framework (UPCM). More specifically, UPCM reduces to PCM when we set a small $\alpha$ or a large $\sigma_v$, and UPCM reduces to APCM when clusters are confined in their physical clusters and possible cluster elimination are ensured. Finally we present further researches of this paper. |
Tasks | |
Published | 2016-10-27 |
URL | http://arxiv.org/abs/1610.08624v1 |
http://arxiv.org/pdf/1610.08624v1.pdf | |
PWC | https://paperswithcode.com/paper/pcm-and-apcm-revisited-an-uncertainty |
Repo | |
Framework | |
Exact gradient updates in time independent of output size for the spherical loss family
Title | Exact gradient updates in time independent of output size for the spherical loss family |
Authors | Pascal Vincent, Alexandre de Brébisson, Xavier Bouthillier |
Abstract | An important class of problems involves training deep neural networks with sparse prediction targets of very high dimension D. These occur naturally in e.g. neural language models or the learning of word-embeddings, often posed as predicting the probability of next words among a vocabulary of size D (e.g. 200,000). Computing the equally large, but typically non-sparse D-dimensional output vector from a last hidden layer of reasonable dimension d (e.g. 500) incurs a prohibitive O(Dd) computational cost for each example, as does updating the $D \times d$ output weight matrix and computing the gradient needed for backpropagation to previous layers. While efficient handling of large sparse network inputs is trivial, the case of large sparse targets is not, and has thus so far been sidestepped with approximate alternatives such as hierarchical softmax or sampling-based approximations during training. In this work we develop an original algorithmic approach which, for a family of loss functions that includes squared error and spherical softmax, can compute the exact loss, gradient update for the output weights, and gradient for backpropagation, all in $O(d^{2})$ per example instead of $O(Dd)$, remarkably without ever computing the D-dimensional output. The proposed algorithm yields a speedup of up to $D/4d$ i.e. two orders of magnitude for typical sizes, for that critical part of the computations that often dominates the training time in this kind of network architecture. |
Tasks | Word Embeddings |
Published | 2016-06-26 |
URL | http://arxiv.org/abs/1606.08061v1 |
http://arxiv.org/pdf/1606.08061v1.pdf | |
PWC | https://paperswithcode.com/paper/exact-gradient-updates-in-time-independent-of |
Repo | |
Framework | |
ASAP: Asynchronous Approximate Data-Parallel Computation
Title | ASAP: Asynchronous Approximate Data-Parallel Computation |
Authors | Asim Kadav, Erik Kruus |
Abstract | Emerging workloads, such as graph processing and machine learning are approximate because of the scale of data involved and the stochastic nature of the underlying algorithms. These algorithms are often distributed over multiple machines using bulk-synchronous processing (BSP) or other synchronous processing paradigms such as map-reduce. However, data parallel processing primitives such as repeated barrier and reduce operations introduce high synchronization overheads. Hence, many existing data-processing platforms use asynchrony and staleness to improve data-parallel job performance. Often, these systems simply change the synchronous communication to asynchronous between the worker nodes in the cluster. This improves the throughput of data processing but results in poor accuracy of the final output since different workers may progress at different speeds and process inconsistent intermediate outputs. In this paper, we present ASAP, a model that provides asynchronous and approximate processing semantics for data-parallel computation. ASAP provides fine-grained worker synchronization using NOTIFY-ACK semantics that allows independent workers to run asynchronously. ASAP also provides stochastic reduce that provides approximate but guaranteed convergence to the same result as an aggregated all-reduce. In our results, we show that ASAP can reduce synchronization costs and provides 2-10X speedups in convergence and up to 10X savings in network costs for distributed machine learning applications and provides strong convergence guarantees. |
Tasks | |
Published | 2016-12-27 |
URL | http://arxiv.org/abs/1612.08608v1 |
http://arxiv.org/pdf/1612.08608v1.pdf | |
PWC | https://paperswithcode.com/paper/asap-asynchronous-approximate-data-parallel |
Repo | |
Framework | |
How Many Folders Do You Really Need?
Title | How Many Folders Do You Really Need? |
Authors | Mihajlo Grbovic, Guy Halawi, Zohar Karnin, Yoelle Maarek |
Abstract | Email classification is still a mostly manual task. Consequently, most Web mail users never define a single folder. Recently however, automatic classification offering the same categories to all users has started to appear in some Web mail clients, such as AOL or Gmail. We adopt this approach, rather than previous (unsuccessful) personalized approaches because of the change in the nature of consumer email traffic, which is now dominated by (non-spam) machine-generated email. We propose here a novel approach for (1) automatically distinguishing between personal and machine-generated email and (2) classifying messages into latent categories, without requiring users to have defined any folder. We report how we have discovered that a set of 6 “latent” categories (one for human- and the others for machine-generated messages) can explain a significant portion of email traffic. We describe in details the steps involved in building a Web-scale email categorization system, from the collection of ground-truth labels, the selection of features to the training of models. Experimental evaluation was performed on more than 500 billion messages received during a period of six months by users of Yahoo mail service, who elected to be part of such research studies. Our system achieved precision and recall rates close to 90% and the latent categories we discovered were shown to cover 70% of both email traffic and email search queries. We believe that these results pave the way for a change of approach in the Web mail industry, and could support the invention of new large-scale email discovery paradigms that had not been possible before. |
Tasks | |
Published | 2016-06-29 |
URL | http://arxiv.org/abs/1606.09296v1 |
http://arxiv.org/pdf/1606.09296v1.pdf | |
PWC | https://paperswithcode.com/paper/how-many-folders-do-you-really-need |
Repo | |
Framework | |
Semi-Supervised Learning for Neural Machine Translation
Title | Semi-Supervised Learning for Neural Machine Translation |
Authors | Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, Yang Liu |
Abstract | While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semi-supervised approach for training NMT models on the concatenation of labeled (parallel corpora) and unlabeled (monolingual corpora) data. The central idea is to reconstruct the monolingual corpora using an autoencoder, in which the source-to-target and target-to-source translation models serve as the encoder and decoder, respectively. Our approach can not only exploit the monolingual corpora of the target language, but also of the source language. Experiments on the Chinese-English dataset show that our approach achieves significant improvements over state-of-the-art SMT and NMT systems. |
Tasks | Machine Translation |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.04596v3 |
http://arxiv.org/pdf/1606.04596v3.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-learning-for-neural-machine |
Repo | |
Framework | |
Feature-Augmented Neural Networks for Patient Note De-identification
Title | Feature-Augmented Neural Networks for Patient Note De-identification |
Authors | Ji Young Lee, Franck Dernoncourt, Ozlem Uzuner, Peter Szolovits |
Abstract | Patient notes contain a wealth of information of potentially great interest to medical investigators. However, to protect patients’ privacy, Protected Health Information (PHI) must be removed from the patient notes before they can be legally released, a process known as patient note de-identification. The main objective for a de-identification system is to have the highest possible recall. Recently, the first neural-network-based de-identification system has been proposed, yielding state-of-the-art results. Unlike other systems, it does not rely on human-engineered features, which allows it to be quickly deployed, but does not leverage knowledge from human experts or from electronic health records (EHRs). In this work, we explore a method to incorporate human-engineered features as well as features derived from EHRs to a neural-network-based de-identification system. Our results show that the addition of features, especially the EHR-derived features, further improves the state-of-the-art in patient note de-identification, including for some of the most sensitive PHI types such as patient names. Since in a real-life setting patient notes typically come with EHRs, we recommend developers of de-identification systems to leverage the information EHRs contain. |
Tasks | |
Published | 2016-10-30 |
URL | http://arxiv.org/abs/1610.09704v1 |
http://arxiv.org/pdf/1610.09704v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-augmented-neural-networks-for-patient |
Repo | |
Framework | |
A Draft Memory Model on Spiking Neural Assemblies
Title | A Draft Memory Model on Spiking Neural Assemblies |
Authors | João Ranhel, João H. Albuquerque, Bruno P. M. Azevedo, Nathalia M. Cunha, Pedro J. Ishimaru |
Abstract | A draft memory model (DM) for neural networks with spike propagation delay (SNNwD) is described. Novelty in this approach are that the DM learns immediately, with stimuli presented once, without synaptic weight changes, and without external learning algorithm. Basal on this model is to trap spikes within neural loops. In order to construct the DM we developed two functional blocks, also described herein. The decoder block receives input from a single spikes source and connect it to one among many outputs. The selector block operates in the opposite direction, receiving many spikes sources and connecting one of them to a single output. We realized conceptual proofs by testing the DM in the prime numbers classifying task. This activation-based memory can be used as immediate and short-term memory. |
Tasks | |
Published | 2016-03-26 |
URL | http://arxiv.org/abs/1603.08146v1 |
http://arxiv.org/pdf/1603.08146v1.pdf | |
PWC | https://paperswithcode.com/paper/a-draft-memory-model-on-spiking-neural |
Repo | |
Framework | |
Probabilistic Receiver Architecture Combining BP, MF, and EP for Multi-Signal Detection
Title | Probabilistic Receiver Architecture Combining BP, MF, and EP for Multi-Signal Detection |
Authors | Daniel J. Jakubisin, R. Michael Buehrer, Claudio R. C. M. da Silva |
Abstract | Receiver algorithms which combine belief propagation (BP) with the mean field (MF) approximation are well-suited for inference of both continuous and discrete random variables. In wireless scenarios involving detection of multiple signals, the standard construction of the combined BP-MF framework includes the equalization or multi-user detection functions within the MF subgraph. In this paper, we show that the MF approximation is not particularly effective for multi-signal detection. We develop a new factor graph construction for application of the BP-MF framework to problems involving the detection of multiple signals. We then develop a low-complexity variant to the proposed construction in which Gaussian BP is applied to the equalization factors. In this case, the factor graph of the joint probability distribution is divided into three subgraphs: (i) a MF subgraph comprised of the observation factors and channel estimation, (ii) a Gaussian BP subgraph which is applied to multi-signal detection, and (iii) a discrete BP subgraph which is applied to demodulation and decoding. Expectation propagation is used to approximate discrete distributions with a Gaussian distribution and links the discrete BP and Gaussian BP subgraphs. The result is a probabilistic receiver architecture with strong theoretical justification which can be applied to multi-signal detection. |
Tasks | graph construction |
Published | 2016-04-17 |
URL | http://arxiv.org/abs/1604.04834v1 |
http://arxiv.org/pdf/1604.04834v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-receiver-architecture-combining |
Repo | |
Framework | |