October 17, 2019

3097 words 15 mins read

Paper Group ANR 691

Paper Group ANR 691

Uncertainty aware audiovisual activity recognition using deep Bayesian variational inference. Camera-based vehicle velocity estimation from monocular video. Positive and Unlabeled Learning through Negative Selection and Imbalance-aware Classification. Deep Convolutional Neural Network Applied to Quality Assessment for Video Tracking. Deep Discrimin …

Uncertainty aware audiovisual activity recognition using deep Bayesian variational inference

Title Uncertainty aware audiovisual activity recognition using deep Bayesian variational inference
Authors Mahesh Subedar, Ranganath Krishnan, Paulo Lopez Meyer, Omesh Tickoo, Jonathan Huang
Abstract Deep neural networks (DNNs) provide state-of-the-art results for a multitude of applications, but the approaches using DNNs for multimodal audiovisual applications do not consider predictive uncertainty associated with individual modalities. Bayesian deep learning methods provide principled confidence and quantify predictive uncertainty. Our contribution in this work is to propose an uncertainty aware multimodal Bayesian fusion framework for activity recognition. We demonstrate a novel approach that combines deterministic and variational layers to scale Bayesian DNNs to deeper architectures. Our experiments using in- and out-of-distribution samples selected from a subset of Moments-in-Time (MiT) dataset show a more reliable confidence measure as compared to the non-Bayesian baseline and the Monte Carlo dropout (MC dropout) approximate Bayesian inference. We also demonstrate the uncertainty estimates obtained from the proposed framework can identify out-of-distribution data on the UCF101 and MiT datasets. In the multimodal setting, the proposed framework improved precision-recall AUC by 10.2% on the subset of MiT dataset as compared to non-Bayesian baseline.
Tasks Activity Recognition, Bayesian Inference, Multimodal Activity Recognition
Published 2018-11-27
URL https://arxiv.org/abs/1811.10811v3
PDF https://arxiv.org/pdf/1811.10811v3.pdf
PWC https://paperswithcode.com/paper/uncertainty-aware-multimodal-activity
Repo
Framework

Camera-based vehicle velocity estimation from monocular video

Title Camera-based vehicle velocity estimation from monocular video
Authors Moritz Kampelmühler, Michael G. Müller, Christoph Feichtenhofer
Abstract This paper documents the winning entry at the CVPR2017 vehicle velocity estimation challenge. Velocity estimation is an emerging task in autonomous driving which has not yet been thoroughly explored. The goal is to estimate the relative velocity of a specific vehicle from a sequence of images. In this paper, we present a light-weight approach for directly regressing vehicle velocities from their trajectories using a multilayer perceptron. Another contribution is an explorative study of features for monocular vehicle velocity estimation. We find that light-weight trajectory based features outperform depth and motion cues extracted from deep ConvNets, especially for far-distance predictions where current disparity and optical flow estimators are challenged significantly. Our light-weight approach is real-time capable on a single CPU and outperforms all competing entries in the velocity estimation challenge. On the test set, we report an average error of 1.12 m/s which is comparable to a (ground-truth) system that combines LiDAR and radar techniques to achieve an error of around 0.71 m/s.
Tasks Autonomous Driving, Optical Flow Estimation
Published 2018-02-20
URL http://arxiv.org/abs/1802.07094v1
PDF http://arxiv.org/pdf/1802.07094v1.pdf
PWC https://paperswithcode.com/paper/camera-based-vehicle-velocity-estimation-from
Repo
Framework

Positive and Unlabeled Learning through Negative Selection and Imbalance-aware Classification

Title Positive and Unlabeled Learning through Negative Selection and Imbalance-aware Classification
Authors Marco Frasca, Nicolò Cesa-Bianchi
Abstract Motivated by applications in protein function prediction, we consider a challenging supervised classification setting in which positive labels are scarce and there are no explicit negative labels. The learning algorithm must thus select which unlabeled examples to use as negative training points, possibly ending up with an unbalanced learning problem. We address these issues by proposing an algorithm that combines active learning (for selecting negative examples) with imbalance-aware learning (for mitigating the label imbalance). In our experiments we observe that these two techniques operate synergistically, outperforming state-of-the-art methods on standard protein function prediction benchmarks.
Tasks Active Learning, Protein Function Prediction
Published 2018-05-18
URL http://arxiv.org/abs/1805.07331v2
PDF http://arxiv.org/pdf/1805.07331v2.pdf
PWC https://paperswithcode.com/paper/positive-and-unlabeled-learning-through
Repo
Framework

Deep Convolutional Neural Network Applied to Quality Assessment for Video Tracking

Title Deep Convolutional Neural Network Applied to Quality Assessment for Video Tracking
Authors Roger Gomez Nieto, Eugenio Tamura Morimitsu
Abstract Surveillance videos often suffer from blur and exposure distortions that occur during acquisition and storage, which can adversely influence following automatic image analysis results on video-analytic tasks. The purpose of this paper is to deploy an algorithm that can automatically assess the presence of exposure distortion in videos. In this work we to design and build one architecture for deep learning applied to recognition of distortions in a video. The goal is to know if the video present exposure distortions. Such an algorithm could be used to enhance or restoration image or to create an object tracker distortion-aware.
Tasks
Published 2018-10-26
URL http://arxiv.org/abs/1810.11550v1
PDF http://arxiv.org/pdf/1810.11550v1.pdf
PWC https://paperswithcode.com/paper/deep-convolutional-neural-network-applied-to
Repo
Framework

Deep Discriminative Latent Space for Clustering

Title Deep Discriminative Latent Space for Clustering
Authors Elad Tzoreff, Olga Kogan, Yoni Choukroun
Abstract Clustering is one of the most fundamental tasks in data analysis and machine learning. It is central to many data-driven applications that aim to separate the data into groups with similar patterns. Moreover, clustering is a complex procedure that is affected significantly by the choice of the data representation method. Recent research has demonstrated encouraging clustering results by learning effectively these representations. In most of these works a deep auto-encoder is initially pre-trained to minimize a reconstruction loss, and then jointly optimized with clustering centroids in order to improve the clustering objective. Those works focus mainly on the clustering phase of the procedure, while not utilizing the potential benefit out of the initial phase. In this paper we propose to optimize an auto-encoder with respect to a discriminative pairwise loss function during the auto-encoder pre-training phase. We demonstrate the high accuracy obtained by the proposed method as well as its rapid convergence (e.g. reaching above 92% accuracy on MNIST during the pre-training phase, in less than 50 epochs), even with small networks.
Tasks
Published 2018-05-28
URL http://arxiv.org/abs/1805.10795v1
PDF http://arxiv.org/pdf/1805.10795v1.pdf
PWC https://paperswithcode.com/paper/deep-discriminative-latent-space-for
Repo
Framework

Making Agents’ Abilities Explicit

Title Making Agents’ Abilities Explicit
Authors Yedi Zhang, Fu Song, Taolue Chen
Abstract Alternating-time temporal logics (ATL/ATL*) represent a family of modal logics for reasoning about agents’ strategic abilities in multiagent systems (MAS). The interpretations of ATL/ATL* over the semantic model Concurrent Game Structures (CGS) usually vary depending on the agents’ abilities, for instance, perfect vs. imperfect information, perfect vs. imperfect recall, resulting in a variety of variants which have been studied extensively in literature. However, they are defined at the semantic level, which may limit modeling flexibilities and may give counter-intuitive interpretations. To mitigate these issues, in this work, we propose to extend CGS with agents’ abilities and study the new semantics of ATL/ATL* under this model. We give PSACE/2EXPTIME model-checking algorithms for ATL/ATL* and implement them as a prototype tool. Experiment results show the practical feasibility of the approach.
Tasks
Published 2018-11-27
URL http://arxiv.org/abs/1811.10901v1
PDF http://arxiv.org/pdf/1811.10901v1.pdf
PWC https://paperswithcode.com/paper/making-agents-abilities-explicit
Repo
Framework

Edge Attention-based Multi-Relational Graph Convolutional Networks

Title Edge Attention-based Multi-Relational Graph Convolutional Networks
Authors Chao Shang, Qinqing Liu, Ko-Shin Chen, Jiangwen Sun, Jin Lu, Jinfeng Yi, Jinbo Bi
Abstract Graph convolutional network (GCN) is generalization of convolutional neural network (CNN) to work with arbitrarily structured graphs. A binary adjacency matrix is commonly used in training a GCN. Recently, the attention mechanism allows the network to learn a dynamic and adaptive aggregation of the neighborhood. We propose a new GCN model on the graphs where edges are characterized in multiple views or precisely in terms of multiple relationships. For instance, in chemical graph theory, compound structures are often represented by the hydrogen-depleted molecular graph where nodes correspond to atoms and edges correspond to chemical bonds. Multiple attributes can be important to characterize chemical bonds, such as atom pair (the types of atoms that a bond connects), aromaticity, and whether a bond is in a ring. The different attributes lead to different graph representations for the same molecule. There is growing interests in both chemistry and machine learning fields to directly learn molecular properties of compounds from the molecular graph, instead of from fingerprints predefined by chemists. The proposed GCN model, which we call edge attention-based multi-relational GCN (EAGCN), jointly learns attention weights and node features in graph convolution. For each bond attribute, a real-valued attention matrix is used to replace the binary adjacency matrix. By designing a dictionary for the edge attention, and forming the attention matrix of each molecule by looking up the dictionary, the EAGCN exploits correspondence between bonds in different molecules. The prediction of compound properties is based on the aggregated node features, which is independent of the varying molecule (graph) size. We demonstrate the efficacy of the EAGCN on multiple chemical datasets: Tox21, HIV, Freesolv, and Lipophilicity, and interpret the resultant attention weights.
Tasks
Published 2018-02-14
URL http://arxiv.org/abs/1802.04944v2
PDF http://arxiv.org/pdf/1802.04944v2.pdf
PWC https://paperswithcode.com/paper/edge-attention-based-multi-relational-graph
Repo
Framework

A Second Order Cumulant Spectrum Test That a Stochastic Process is Strictly Stationary and a Step Toward a Test for Graph Signal Strict Stationarity

Title A Second Order Cumulant Spectrum Test That a Stochastic Process is Strictly Stationary and a Step Toward a Test for Graph Signal Strict Stationarity
Authors Denisa Roberts, Douglas Patterson
Abstract This article develops a statistical test for the null hypothesis of strict stationarity of a discrete time stochastic process in the frequency domain. When the null hypothesis is true, the second order cumulant spectrum is zero at all the discrete Fourier frequency pairs in the principal domain. The test uses a window averaged sample estimate of the second order cumulant spectrum to build a test statistic with an asymptotic complex standard normal distribution. We derive the test statistic, study the properties of the test and demonstrate its application using 137Cs gamma ray decay data. Future areas of research include testing for strict stationarity of graph signals, with applications in learning convolutional neural networks on graphs, denoising, and inpainting.
Tasks Denoising
Published 2018-01-20
URL https://arxiv.org/abs/1801.06727v2
PDF https://arxiv.org/pdf/1801.06727v2.pdf
PWC https://paperswithcode.com/paper/a-second-order-cumulant-spectrum-based-test
Repo
Framework

Deciding the status of controversial phonemes using frequency distributions; an application to semiconsonants in Spanish

Title Deciding the status of controversial phonemes using frequency distributions; an application to semiconsonants in Spanish
Authors Manuel Ortega-Rodríguez, Hugo Solís-Sánchez, Ricardo Gamboa-Alfaro
Abstract Exploiting the fact that natural languages are complex systems, the present exploratory article proposes a direct method based on frequency distributions that may be useful when making a decision on the status of problematic phonemes, an open problem in linguistics. The main notion is that natural languages, which can be considered from a complex outlook as information processing machines, and which somehow manage to set appropriate levels of redundancy, already “made the choice” whether a linguistic unit is a phoneme or not, and this would be reflected in a greater smoothness in a frequency versus rank graph. For the particular case we chose to study, we conclude that it is reasonable to consider the Spanish semiconsonant /w/ as a separate phoneme from its vowel counterpart /u/, on the one hand, and possibly also the semiconsonant /j/ as a separate phoneme from its vowel counterpart /i/, on the other. As language has been so central a topic in the study of complexity, this discussion grants us, in addition, an opportunity to gain insight into emerging properties in the broader complex systems debate.
Tasks
Published 2018-08-22
URL http://arxiv.org/abs/1808.07166v1
PDF http://arxiv.org/pdf/1808.07166v1.pdf
PWC https://paperswithcode.com/paper/deciding-the-status-of-controversial-phonemes
Repo
Framework

Deep Neural Networks for Pattern Recognition

Title Deep Neural Networks for Pattern Recognition
Authors Kyongsik Yun, Alexander Huyen, Thomas Lu
Abstract In the field of pattern recognition research, the method of using deep neural networks based on improved computing hardware recently attracted attention because of their superior accuracy compared to conventional methods. Deep neural networks simulate the human visual system and achieve human equivalent accuracy in image classification, object detection, and segmentation. This chapter introduces the basic structure of deep neural networks that simulate human neural networks. Then we identify the operational processes and applications of conditional generative adversarial networks, which are being actively researched based on the bottom-up and top-down mechanisms, the most important functions of the human visual perception process. Finally, recent developments in training strategies for effective learning of complex deep neural networks are addressed.
Tasks Image Classification, Object Detection
Published 2018-09-25
URL http://arxiv.org/abs/1809.09645v1
PDF http://arxiv.org/pdf/1809.09645v1.pdf
PWC https://paperswithcode.com/paper/deep-neural-networks-for-pattern-recognition
Repo
Framework

Density Weighted Connectivity of Grass Pixels in Image Frames for Biomass Estimation

Title Density Weighted Connectivity of Grass Pixels in Image Frames for Biomass Estimation
Authors Ligang Zhang, Brijesh Verma, David Stockwell, Sujan Chowdhury
Abstract Accurate estimation of the biomass of roadside grasses plays a significant role in applications such as fire-prone region identification. Current solutions heavily depend on field surveys, remote sensing measurements and image processing using reference markers, which often demand big investments of time, effort and cost. This paper proposes Density Weighted Connectivity of Grass Pixels (DWCGP) to automatically estimate grass biomass from roadside image data. The DWCGP calculates the length of continuously connected grass pixels along a vertical orientation in each image column, and then weights the length by the grass density in a surrounding region of the column. Grass pixels are classified using feedforward artificial neural networks and the dominant texture orientation at every pixel is computed using multi-orientation Gabor wavelet filter vote. Evaluations on a field survey dataset show that the DWCGP reduces Root-Mean-Square Error from 5.84 to 5.52 by additionally considering grass density on top of grass height. The DWCGP shows robustness to non-vertical grass stems and to changes of both Gabor filter parameters and surrounding region widths. It also has performance close to human observation and higher than eight baseline approaches, as well as promising results for classifying low vs. high fire risk and identifying fire-prone road regions.
Tasks
Published 2018-02-21
URL http://arxiv.org/abs/1802.07512v1
PDF http://arxiv.org/pdf/1802.07512v1.pdf
PWC https://paperswithcode.com/paper/density-weighted-connectivity-of-grass-pixels
Repo
Framework

Conditional Network Embeddings

Title Conditional Network Embeddings
Authors Bo Kang, Jefrey Lijffijt, Tijl De Bie
Abstract Network Embeddings (NEs) map the nodes of a given network into $d$-dimensional Euclidean space $\mathbb{R}^d$. Ideally, this mapping is such that similar' nodes are mapped onto nearby points, such that the NE can be used for purposes such as link prediction (if similar’ means being more likely to be connected') or classification (if similar’ means `being more likely to have the same label’). In recent years various methods for NE have been introduced, all following a similar strategy: defining a notion of similarity between nodes (typically some distance measure within the network), a distance measure in the embedding space, and a loss function that penalizes large distances for similar nodes and small distances for dissimilar nodes. A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties: (approximate) multipartiteness, certain degree distributions, assortativity, etc. To overcome this, we introduce a conceptual innovation to the NE literature and propose to create \emph{Conditional Network Embeddings} (CNEs); embeddings that maximally add information with respect to given structural properties (e.g. node degrees, block densities, etc.). We use a simple Bayesian approach to achieve this, and propose a block stochastic gradient descent algorithm for fitting it efficiently. We demonstrate that CNEs are superior for link prediction and multi-label classification when compared to state-of-the-art methods, and this without adding significant mathematical or computational complexity. Finally, we illustrate the potential of CNE for network visualization. |
Tasks Link Prediction, Multi-Label Classification
Published 2018-05-19
URL http://arxiv.org/abs/1805.07544v3
PDF http://arxiv.org/pdf/1805.07544v3.pdf
PWC https://paperswithcode.com/paper/conditional-network-embeddings
Repo
Framework

On Modular Training of Neural Acoustics-to-Word Model for LVCSR

Title On Modular Training of Neural Acoustics-to-Word Model for LVCSR
Authors Zhehuai Chen, Qi Liu, Hao Li, Kai Yu
Abstract End-to-end (E2E) automatic speech recognition (ASR) systems directly map acoustics to words using a unified model. Previous works mostly focus on E2E training a single model which integrates acoustic and language model into a whole. Although E2E training benefits from sequence modeling and simplified decoding pipelines, large amount of transcribed acoustic data is usually required, and traditional acoustic and language modelling techniques cannot be utilized. In this paper, a novel modular training framework of E2E ASR is proposed to separately train neural acoustic and language models during training stage, while still performing end-to-end inference in decoding stage. Here, an acoustics-to-phoneme model (A2P) and a phoneme-to-word model (P2W) are trained using acoustic data and text data respectively. A phone synchronous decoding (PSD) module is inserted between A2P and P2W to reduce sequence lengths without precision loss. Finally, modules are integrated into an acousticsto-word model (A2W) and jointly optimized using acoustic data to retain the advantage of sequence modeling. Experiments on a 300- hour Switchboard task show significant improvement over the direct A2W model. The efficiency in both training and decoding also benefits from the proposed method.
Tasks Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published 2018-03-03
URL http://arxiv.org/abs/1803.01090v1
PDF http://arxiv.org/pdf/1803.01090v1.pdf
PWC https://paperswithcode.com/paper/on-modular-training-of-neural-acoustics-to
Repo
Framework

Estimating Graphlet Statistics via Lifting

Title Estimating Graphlet Statistics via Lifting
Authors Kirill Paramonov, James Sharpnack
Abstract Exploratory analysis over network data is often limited by our ability to efficiently calculate graph statistics, which can provide a model-free understanding of macroscopic properties of a network. This work introduces a framework for estimating the graphlet count - the number of occurrences of a small subgraph motif (e.g. a wedge or a triangle) in the network. For massive graphs, where accessing the whole graph is not possible, the only viable algorithms are those which act locally by making a limited number of vertex neighborhood queries. We introduce a Monte Carlo sampling technique for graphlet counts, called lifting, which can simultaneously sample all graphlets of size up to $k$ vertices. We outline three variants of lifted graphlet counts: the ordered, unordered, and shotgun estimators. We prove that our graphlet count updates are unbiased for the true graphlet count, have low correlation between samples, and have a controlled variance. We compare the experimental performance of lifted graphlet counts to the state-of-the art graphlet sampling procedures: Waddling and the pairwise subgraph random walk.
Tasks
Published 2018-02-23
URL http://arxiv.org/abs/1802.08736v1
PDF http://arxiv.org/pdf/1802.08736v1.pdf
PWC https://paperswithcode.com/paper/estimating-graphlet-statistics-via-lifting
Repo
Framework

Embedded polarizing filters to separate diffuse and specular reflection

Title Embedded polarizing filters to separate diffuse and specular reflection
Authors Laurent Valentin Jospin, Gilles Baechler, Adam Scholefield
Abstract Polarizing filters provide a powerful way to separate diffuse and specular reflection; however, traditional methods rely on several captures and require proper alignment of the filters. Recently, camera manufacturers have proposed to embed polarizing micro-filters in front of the sensor, creating a mosaic of pixels with different polarizations. In this paper, we investigate the advantages of such camera designs. In particular, we consider different design patterns for the filter arrays and propose an algorithm to demosaic an image generated by such cameras. This essentially allows us to separate the diffuse and specular components using a single image. The performance of our algorithm is compared with a color-based method using synthetic and real data. Finally, we demonstrate how we can recover the normals of a scene using the diffuse images estimated by our method.
Tasks
Published 2018-11-06
URL http://arxiv.org/abs/1811.02608v1
PDF http://arxiv.org/pdf/1811.02608v1.pdf
PWC https://paperswithcode.com/paper/embedded-polarizing-filters-to-separate
Repo
Framework
comments powered by Disqus