May 5, 2019

3072 words 15 mins read

Paper Group ANR 450

Transport Analysis of Infinitely Deep Neural Network. A Hackathon for Classical Tibetan. Mahalanobis Distance for Class Averaging of Cryo-EM Images. Fractional Order Load-Frequency Control of Interconnected Power Systems Using Chaotic Multi-objective Optimization. Identifying Candidate Risk Factors for Prescription Drug Side Effects using Causal Co …

Transport Analysis of Infinitely Deep Neural Network


Title	Transport Analysis of Infinitely Deep Neural Network
Authors	Sho Sonoda, Noboru Murata
Abstract	We investigated the feature map inside deep neural networks (DNNs) by tracking the transport map. We are interested in the role of depth (why do DNNs perform better than shallow models?) and the interpretation of DNNs (what do intermediate layers do?) Despite the rapid development in their application, DNNs remain analytically unexplained because the hidden layers are nested and the parameters are not faithful. Inspired by the integral representation of shallow NNs, which is the continuum limit of the width, or the hidden unit number, we developed the flow representation and transport analysis of DNNs. The flow representation is the continuum limit of the depth or the hidden layer number, and it is specified by an ordinary differential equation with a vector field. We interpret an ordinary DNN as a transport map or a Euler broken line approximation of the flow. Technically speaking, a dynamical system is a natural model for the nested feature maps. In addition, it opens a new way to the coordinate-free treatment of DNNs by avoiding the redundant parametrization of DNNs. Following Wasserstein geometry, we analyze a flow in three aspects: dynamical system, continuity equation, and Wasserstein gradient flow. A key finding is that we specified a series of transport maps of the denoising autoencoder (DAE). Starting from the shallow DAE, this paper develops three topics: the transport map of the deep DAE, the equivalence between the stacked DAE and the composition of DAEs, and the development of the double continuum limit or the integral representation of the flow representation. As partial answers to the research questions, we found that deeper DAEs converge faster and the extracted features are better; in addition, a deep Gaussian DAE transports mass to decrease the Shannon entropy of the data distribution.
Tasks	Denoising
Published	2016-05-10
URL	http://arxiv.org/abs/1605.02832v2
PDF	http://arxiv.org/pdf/1605.02832v2.pdf
PWC	https://paperswithcode.com/paper/transport-analysis-of-infinitely-deep-neural
Repo
Framework

A Hackathon for Classical Tibetan


Title	A Hackathon for Classical Tibetan
Authors	Orna Almogi, Lena Dankin, Nachum Dershowitz, Lior Wolf
Abstract	We describe the course of a hackathon dedicated to the development of linguistic tools for Tibetan Buddhist studies. Over a period of five days, a group of seventeen scholars, scientists, and students developed and compared algorithms for intertextual alignment and text classification, along with some basic language tools, including a stemmer and word segmenter.
Tasks	Text Classification
Published	2016-09-27
URL	http://arxiv.org/abs/1609.08389v2
PDF	http://arxiv.org/pdf/1609.08389v2.pdf
PWC	https://paperswithcode.com/paper/a-hackathon-for-classical-tibetan
Repo
Framework

Mahalanobis Distance for Class Averaging of Cryo-EM Images


Title	Mahalanobis Distance for Class Averaging of Cryo-EM Images
Authors	Tejal Bhamre, Zhizhen Zhao, Amit Singer
Abstract	Single particle reconstruction (SPR) from cryo-electron microscopy (EM) is a technique in which the 3D structure of a molecule needs to be determined from its contrast transfer function (CTF) affected, noisy 2D projection images taken at unknown viewing directions. One of the main challenges in cryo-EM is the typically low signal to noise ratio (SNR) of the acquired images. 2D classification of images, followed by class averaging, improves the SNR of the resulting averages, and is used for selecting particles from micrographs and for inspecting the particle images. We introduce a new affinity measure, akin to the Mahalanobis distance, to compare cryo-EM images belonging to different defocus groups. The new similarity measure is employed to detect similar images, thereby leading to an improved algorithm for class averaging. We evaluate the performance of the proposed class averaging procedure on synthetic datasets, obtaining state of the art classification.
Tasks
Published	2016-11-10
URL	http://arxiv.org/abs/1611.03193v4
PDF	http://arxiv.org/pdf/1611.03193v4.pdf
PWC	https://paperswithcode.com/paper/mahalanobis-distance-for-class-averaging-of
Repo
Framework

Fractional Order Load-Frequency Control of Interconnected Power Systems Using Chaotic Multi-objective Optimization


Title	Fractional Order Load-Frequency Control of Interconnected Power Systems Using Chaotic Multi-objective Optimization
Authors	Indranil Pan, Saptarshi Das
Abstract	Fractional order proportional-integral-derivative (FOPID) controllers are designed for load frequency control (LFC) of two interconnected power systems. Conflicting time domain design objectives are considered in a multi objective optimization (MOO) based design framework to design the gains and the fractional differ-integral orders of the FOPID controllers in the two areas. Here, we explore the effect of augmenting two different chaotic maps along with the uniform random number generator (RNG) in the popular MOO algorithm - the Non-dominated Sorting Genetic Algorithm-II (NSGA-II). Different measures of quality for MOO e.g. hypervolume indicator, moment of inertia based diversity metric, total Pareto spread, spacing metric are adopted to select the best set of controller parameters from multiple runs of all the NSGA-II variants (i.e. nominal and chaotic versions). The chaotic versions of the NSGA-II algorithm are compared with the standard NSGA-II in terms of solution quality and computational time. In addition, the Pareto optimal fronts showing the trade-off between the two conflicting time domain design objectives are compared to show the advantage of using the FOPID controller over that with simple PID controller. The nature of fast/slow and high/low noise amplification effects of the FOPID structure or the four quadrant operation in the two inter-connected areas of the power system is also explored. A fuzzy logic based method has been adopted next to select the best compromise solution from the best Pareto fronts corresponding to each MOO comparison criteria. The time domain system responses are shown for the fuzzy best compromise solutions under nominal operating conditions. Comparative analysis on the merits and de-merits of each controller structure is reported then. A robustness analysis is also done for the PID and the FOPID controllers.
Tasks
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09802v1
PDF	http://arxiv.org/pdf/1611.09802v1.pdf
PWC	https://paperswithcode.com/paper/fractional-order-load-frequency-control-of
Repo
Framework

Identifying Candidate Risk Factors for Prescription Drug Side Effects using Causal Contrast Set Mining


Title	Identifying Candidate Risk Factors for Prescription Drug Side Effects using Causal Contrast Set Mining
Authors	Jenna Reps, Zhaoyang Guo, Haoyue Zhu, Uwe Aickelin
Abstract	Big longitudinal observational databases present the opportunity to extract new knowledge in a cost effective manner. Unfortunately, the ability of these databases to be used for causal inference is limited due to the passive way in which the data are collected resulting in various forms of bias. In this paper we investigate a method that can overcome these limitations and determine causal contrast set rules efficiently from big data. In particular, we present a new methodology for the purpose of identifying risk factors that increase a patients likelihood of experiencing the known rare side effect of renal failure after ingesting aminosalicylates. The results show that the methodology was able to identify previously researched risk factors such as being prescribed diuretics and highlighted that patients with a higher than average risk of renal failure may be even more susceptible to experiencing it as a side effect after ingesting aminosalicylates.
Tasks	Causal Inference
Published	2016-07-20
URL	http://arxiv.org/abs/1607.05845v1
PDF	http://arxiv.org/pdf/1607.05845v1.pdf
PWC	https://paperswithcode.com/paper/identifying-candidate-risk-factors-for
Repo
Framework

Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions


Title	Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions
Authors	Johanna Carvajal, Arnold Wiliem, Chris McCool, Brian Lovell, Conrad Sanderson
Abstract	We present a comparative evaluation of various techniques for action recognition while keeping as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both categories we use their corresponding nearest neighbour classifiers, kernels, and recent kernelised sparse representations. We compare against traditional action recognition techniques based on Gaussian mixture models and Fisher vectors (FVs). We evaluate these action recognition techniques under ideal conditions, as well as their sensitivity in more challenging conditions (variations in scale and translation). Despite recent advancements for handling manifolds, manifold based techniques obtain the lowest performance and their kernel representations are more unstable in the presence of challenging conditions. The FV approach obtains the highest accuracy under ideal conditions. Moreover, FV best deals with moderate scale and translation changes.
Tasks	Temporal Action Localization
Published	2016-02-04
URL	http://arxiv.org/abs/1602.01599v3
PDF	http://arxiv.org/pdf/1602.01599v3.pdf
PWC	https://paperswithcode.com/paper/comparative-evaluation-of-action-recognition
Repo
Framework

Neural computation from first principles: Using the maximum entropy method to obtain an optimal bits-per-joule neuron


Title	Neural computation from first principles: Using the maximum entropy method to obtain an optimal bits-per-joule neuron
Authors	William B Levy, Toby Berger, Mustafa Sungkar
Abstract	Optimization results are one method for understanding neural computation from Nature’s perspective and for defining the physical limits on neuron-like engineering. Earlier work looks at individual properties or performance criteria and occasionally a combination of two, such as energy and information. Here we make use of Jaynes’ maximum entropy method and combine a larger set of constraints, possibly dimensionally distinct, each expressible as an expectation. The method identifies a likelihood-function and a sufficient statistic arising from each such optimization. This likelihood is a first-hitting time distribution in the exponential class. Particular constraint sets are identified that, from an optimal inference perspective, justify earlier neurocomputational models. Interactions between constraints, mediated through the inferred likelihood, restrict constraint-set parameterizations, e.g., the energy-budget limits estimation performance which, in turn, matches an axonal communication constraint. Such linkages are, for biologists, experimental predictions of the method. In addition to the related likelihood, at least one type of constraint set implies marginal distributions, and in this case, a Shannon bits/joule statement arises.
Tasks
Published	2016-06-06
URL	http://arxiv.org/abs/1606.03063v2
PDF	http://arxiv.org/pdf/1606.03063v2.pdf
PWC	https://paperswithcode.com/paper/neural-computation-from-first-principles
Repo
Framework

Recurrent neural network training with preconditioned stochastic gradient descent


Title	Recurrent neural network training with preconditioned stochastic gradient descent
Authors	Xi-Lin Li
Abstract	This paper studies the performance of a recently proposed preconditioned stochastic gradient descent (PSGD) algorithm on recurrent neural network (RNN) training. PSGD adaptively estimates a preconditioner to accelerate gradient descent, and is designed to be simple, general and easy to use, as stochastic gradient descent (SGD). RNNs, especially the ones requiring extremely long term memories, are difficult to train. We have tested PSGD on a set of synthetic pathological RNN learning problems and the real world MNIST handwritten digit recognition task. Experimental results suggest that PSGD is able to achieve highly competitive performance without using any trick like preprocessing, pretraining or parameter tweaking.
Tasks	Handwritten Digit Recognition
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04449v2
PDF	http://arxiv.org/pdf/1606.04449v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-network-training-with
Repo
Framework

Automated Segmentation of Retinal Layers from Optical Coherent Tomography Images Using Geodesic Distance


Title	Automated Segmentation of Retinal Layers from Optical Coherent Tomography Images Using Geodesic Distance
Authors	Jinming Duan, Christopher Tench, Irene Gottlob, Frank Proudlock, Li Bai
Abstract	Optical coherence tomography (OCT) is a non-invasive imaging technique that can produce images of the eye at the microscopic level. OCT image segmentation to localise retinal layer boundaries is a fundamental procedure for diagnosing and monitoring the progression of retinal and optical nerve disorders. In this paper, we introduce a novel and accurate geodesic distance method (GDM) for OCT segmentation of both healthy and pathological images in either two- or three-dimensional spaces. The method uses a weighted geodesic distance by an exponential function, taking into account both horizontal and vertical intensity variations. The weighted geodesic distance is efficiently calculated from an Eikonal equation via the fast sweeping method. The segmentation is then realised by solving an ordinary differential equation with the geodesic distance. The results of the GDM are compared with manually segmented retinal layer boundaries/surfaces. Extensive experiments demonstrate that the proposed GDM is robust to complex retinal structures with large curvatures and irregularities and it outperforms the parametric active contour algorithm as well as the graph theoretic based approaches for delineating the retinal layers in both healthy and pathological images.
Tasks	Semantic Segmentation
Published	2016-09-07
URL	http://arxiv.org/abs/1609.02214v1
PDF	http://arxiv.org/pdf/1609.02214v1.pdf
PWC	https://paperswithcode.com/paper/automated-segmentation-of-retinal-layers-from
Repo
Framework

Shift-Reduce Constituent Parsing with Neural Lookahead Features


Title	Shift-Reduce Constituent Parsing with Neural Lookahead Features
Authors	Jiangming Liu, Yue Zhang
Abstract	Transition-based models can be fast and accurate for constituent parsing. Compared with chart-based models, they leverage richer features by extracting history information from a parser stack, which spans over non-local constituents. On the other hand, during incremental parsing, constituent information on the right hand side of the current word is not utilized, which is a relative weakness of shift-reduce parsing. To address this limitation, we leverage a fast neural model to extract lookahead features. In particular, we build a bidirectional LSTM model, which leverages the full sentence information to predict the hierarchy of constituents that each word starts and ends. The results are then passed to a strong transition-based constituent parser as lookahead features. The resulting parser gives 1.3% absolute improvement in WSJ and 2.3% in CTB compared to the baseline, given the highest reported accuracies for fully-supervised parsing.
Tasks
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00567v1
PDF	http://arxiv.org/pdf/1612.00567v1.pdf
PWC	https://paperswithcode.com/paper/shift-reduce-constituent-parsing-with-neural
Repo
Framework

Scaling Factorial Hidden Markov Models: Stochastic Variational Inference without Messages


Title	Scaling Factorial Hidden Markov Models: Stochastic Variational Inference without Messages
Authors	Yin Cheng Ng, Pawel Chilinski, Ricardo Silva
Abstract	Factorial Hidden Markov Models (FHMMs) are powerful models for sequential data but they do not scale well with long sequences. We propose a scalable inference and learning algorithm for FHMMs that draws on ideas from the stochastic variational inference, neural network and copula literatures. Unlike existing approaches, the proposed algorithm requires no message passing procedure among latent variables and can be distributed to a network of computers to speed up learning. Our experiments corroborate that the proposed algorithm does not introduce further approximation bias compared to the proven structured mean-field algorithm, and achieves better performance with long sequences and large FHMMs.
Tasks
Published	2016-08-12
URL	http://arxiv.org/abs/1608.03817v3
PDF	http://arxiv.org/pdf/1608.03817v3.pdf
PWC	https://paperswithcode.com/paper/scaling-factorial-hidden-markov-models
Repo
Framework

A Siamese Long Short-Term Memory Architecture for Human Re-Identification


Title	A Siamese Long Short-Term Memory Architecture for Human Re-Identification
Authors	Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, Gang Wang
Abstract	Matching pedestrians across multiple camera views known as human re-identification (re-identification) is a challenging problem in visual surveillance. In the existing works concentrating on feature extraction, representations are formed locally and independent of other regions. We present a novel siamese Long Short-Term Memory (LSTM) architecture that can process image regions sequentially and enhance the discriminative capability of local feature representation by leveraging contextual information. The feedback connections and internal gating mechanism of the LSTM cells enable our model to memorize the spatial dependencies and selectively propagate relevant contextual information through the network. We demonstrate improved performance compared to the baseline algorithm with no LSTM units and promising results compared to state-of-the-art methods on Market-1501, CUHK03 and VIPeR datasets. Visualization of the internal mechanism of LSTM cells shows meaningful patterns can be learned by our method.
Tasks	Person Re-Identification
Published	2016-07-28
URL	http://arxiv.org/abs/1607.08381v1
PDF	http://arxiv.org/pdf/1607.08381v1.pdf
PWC	https://paperswithcode.com/paper/a-siamese-long-short-term-memory-architecture
Repo
Framework

Recurrent Attention Models for Depth-Based Person Identification


Title	Recurrent Attention Models for Depth-Based Person Identification
Authors	Albert Haque, Alexandre Alahi, Li Fei-Fei
Abstract	We present an attention-based model that reasons on human body shape and motion dynamics to identify individuals in the absence of RGB information, hence in the dark. Our approach leverages unique 4D spatio-temporal signatures to address the identification problem across days. Formulated as a reinforcement learning task, our model is based on a combination of convolutional and recurrent neural networks with the goal of identifying small, discriminative regions indicative of human identity. We demonstrate that our model produces state-of-the-art results on several published datasets given only depth images. We further study the robustness of our model towards viewpoint, appearance, and volumetric changes. Finally, we share insights gleaned from interpretable 2D, 3D, and 4D visualizations of our model’s spatio-temporal attention.
Tasks	Person Identification
Published	2016-11-22
URL	http://arxiv.org/abs/1611.07212v1
PDF	http://arxiv.org/pdf/1611.07212v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-attention-models-for-depth-based
Repo
Framework

The IBM Speaker Recognition System: Recent Advances and Error Analysis


Title	The IBM Speaker Recognition System: Recent Advances and Error Analysis
Authors	Seyed Omid Sadjadi, Jason Pelecanos, Sriram Ganapathy
Abstract	We present the recent advances along with an error analysis of the IBM speaker recognition system for conversational speech. Some of the key advancements that contribute to our system include: a nearest-neighbor discriminant analysis (NDA) approach (as opposed to LDA) for intersession variability compensation in the i-vector space, the application of speaker and channel-adapted features derived from an automatic speech recognition (ASR) system for speaker recognition, and the use of a DNN acoustic model with a very large number of output units (~10k senones) to compute the frame-level soft alignments required in the i-vector estimation process. We evaluate these techniques on the NIST 2010 SRE extended core conditions (C1-C9), as well as the 10sec-10sec condition. To our knowledge, results achieved by our system represent the best performances published to date on these conditions. For example, on the extended tel-tel condition (C5) the system achieves an EER of 0.59%. To garner further understanding of the remaining errors (on C5), we examine the recordings associated with the low scoring target trials, where various issues are identified for the problematic recordings/trials. Interestingly, it is observed that correcting the pathological recordings not only improves the scores for the target trials but also for the nontarget trials.
Tasks	Speaker Recognition, Speech Recognition
Published	2016-05-05
URL	http://arxiv.org/abs/1605.01635v1
PDF	http://arxiv.org/pdf/1605.01635v1.pdf
PWC	https://paperswithcode.com/paper/the-ibm-speaker-recognition-system-recent
Repo
Framework

Crossing the Road Without Traffic Lights: An Android-based Safety Device


Title	Crossing the Road Without Traffic Lights: An Android-based Safety Device
Authors	Adi Perry, Dor Verbin, Nahum Kiryati
Abstract	In the absence of pedestrian crossing lights, finding a safe moment to cross the road is often hazardous and challenging, especially for people with visual impairments. We present a reliable low-cost solution, an Android device attached to a traffic sign or lighting pole near the crossing, indicating whether it is safe to cross the road. The indication can be by sound, display, vibration, and various communication modalities provided by the Android device. The integral system camera is aimed at approaching traffic. Optical flow is computed from the incoming video stream, and projected onto an influx map, automatically acquired during a brief training period. The crossing safety is determined based on a 1-dimensional temporal signal derived from the projection. We implemented the complete system on a Samsung Galaxy K-Zoom Android smartphone, and obtained real-time operation. The system achieves promising experimental results, providing pedestrians with sufficiently early warning of approaching vehicles. The system can serve as a stand-alone safety device, that can be installed where pedestrian crossing lights are ruled out. Requiring no dedicated infrastructure, it can be powered by a solar panel and remotely maintained via the cellular network.
Tasks	Optical Flow Estimation
Published	2016-10-11
URL	http://arxiv.org/abs/1610.03393v1
PDF	http://arxiv.org/pdf/1610.03393v1.pdf
PWC	https://paperswithcode.com/paper/crossing-the-road-without-traffic-lights-an
Repo
Framework