July 28, 2019

3262 words 16 mins read

Paper Group ANR 305

Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech. Geometric Enclosing Networks. Relative Depth Order Estimation Using Multi-scale Densely Connected Convolutional Networks. Stein Variational Gradient Descent as Gradient Flow. Analysing Data-To-Text Gener …

Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech


Title	Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech
Authors	Michael Neumann, Ngoc Thang Vu
Abstract	Speech emotion recognition is an important and challenging task in the realm of human-computer interaction. Prior work proposed a variety of models and feature sets for training a system. In this work, we conduct extensive experiments using an attentive convolutional neural network with multi-view learning objective function. We compare system performance using different lengths of the input signal, different types of acoustic features and different types of emotion speech (improvised/scripted). Our experimental results on the Interactive Emotional Motion Capture (IEMOCAP) database reveal that the recognition performance strongly depends on the type of speech data independent of the choice of input features. Furthermore, we achieved state-of-the-art results on the improvised speech data of IEMOCAP.
Tasks	Emotion Recognition, Motion Capture, MULTI-VIEW LEARNING, Speech Emotion Recognition
Published	2017-06-02
URL	http://arxiv.org/abs/1706.00612v1
PDF	http://arxiv.org/pdf/1706.00612v1.pdf
PWC	https://paperswithcode.com/paper/attentive-convolutional-neural-network-based
Repo
Framework

Geometric Enclosing Networks


Title	Geometric Enclosing Networks
Authors	Trung Le, Hung Vu, Tu Dinh Nguyen, Dinh Phung
Abstract	Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current state-of-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G\left(\bz\right) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data.
Tasks
Published	2017-08-16
URL	http://arxiv.org/abs/1708.04733v2
PDF	http://arxiv.org/pdf/1708.04733v2.pdf
PWC	https://paperswithcode.com/paper/geometric-enclosing-networks
Repo
Framework

Relative Depth Order Estimation Using Multi-scale Densely Connected Convolutional Networks


Title	Relative Depth Order Estimation Using Multi-scale Densely Connected Convolutional Networks
Authors	Ruoxi Deng, Tianqi Zhao, Chunhua Shen, Shengjun Liu
Abstract	We study the problem of estimating the relative depth order of point pairs in a monocular image. Recent advances mainly focus on using deep convolutional neural networks (DCNNs) to learn and infer the ordinal information from multiple contextual information of the points pair such as global scene context, local contextual information, and the locations. However, it remains unclear how much each context contributes to the task. To address this, we first examine the contribution of each context cue [1], [2] to the performance in the context of depth order estimation. We find out the local context surrounding the points pair contributes the most and the global scene context helps little. Based on the findings, we propose a simple method, using a multi-scale densely-connected network to tackle the task. Instead of learning the global structure, we dedicate to explore the local structure by learning to regress from regions of multiple sizes around the point pairs. Moreover, we use the recent densely connected network [3] to encourage substantial feature reuse as well as deepen our network to boost the performance. We show in experiments that the results of our approach is on par with or better than the state-of-the-art methods with the benefit of using only a small number of training data.
Tasks
Published	2017-07-25
URL	http://arxiv.org/abs/1707.08063v2
PDF	http://arxiv.org/pdf/1707.08063v2.pdf
PWC	https://paperswithcode.com/paper/relative-depth-order-estimation-using-multi
Repo
Framework

Stein Variational Gradient Descent as Gradient Flow


Title	Stein Variational Gradient Descent as Gradient Flow
Authors	Qiang Liu
Abstract	Stein variational gradient descent (SVGD) is a deterministic sampling algorithm that iteratively transports a set of particles to approximate given distributions, based on an efficient gradient-based update that guarantees to optimally decrease the KL divergence within a function space. This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional under a new metric structure induced by Stein operator. We also provide a number of results on Stein operator and Stein’s identity using the notion of weak derivative, including a new proof of the distinguishability of Stein discrepancy under weak conditions.
Tasks
Published	2017-04-25
URL	http://arxiv.org/abs/1704.07520v2
PDF	http://arxiv.org/pdf/1704.07520v2.pdf
PWC	https://paperswithcode.com/paper/stein-variational-gradient-descent-as
Repo
Framework

Analysing Data-To-Text Generation Benchmarks


Title	Analysing Data-To-Text Generation Benchmarks
Authors	Laura Perez-Beltrachini, Claire Gardent
Abstract	Recently, several data-sets associating data to text have been created to train data-to-text surface realisers. It is unclear however to what extent the surface realisation task exercised by these data-sets is linguistically challenging. Do these data-sets provide enough variety to encourage the development of generic, high-quality data-to-text surface realisers ? In this paper, we argue that these data-sets have important drawbacks. We back up our claim using statistics, metrics and manual evaluation. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text surface realisers.
Tasks	Data-to-Text Generation, Text Generation
Published	2017-05-10
URL	http://arxiv.org/abs/1705.03802v1
PDF	http://arxiv.org/pdf/1705.03802v1.pdf
PWC	https://paperswithcode.com/paper/analysing-data-to-text-generation-benchmarks
Repo
Framework

Lower Bounds for Two-Sample Structural Change Detection in Ising and Gaussian Models


Title	Lower Bounds for Two-Sample Structural Change Detection in Ising and Gaussian Models
Authors	Aditya Gangrade, Bobak Nazer, Venkatesh Saligrama
Abstract	The change detection problem is to determine if the Markov network structures of two Markov random fields differ from one another given two sets of samples drawn from the respective underlying distributions. We study the trade-off between the sample sizes and the reliability of change detection, measured as a minimax risk, for the important cases of the Ising models and the Gaussian Markov random fields restricted to the models which have network structures with $p$ nodes and degree at most $d$, and obtain information-theoretic lower bounds for reliable change detection over these models. We show that for the Ising model, $\Omega\left(\frac{d^2}{(\log d)^2}\log p\right)$ samples are required from each dataset to detect even the sparsest possible changes, and that for the Gaussian, $\Omega\left( \gamma^{-2} \log(p)\right)$ samples are required from each dataset to detect change, where $\gamma$ is the smallest ratio of off-diagonal to diagonal terms in the precision matrices of the distributions. These bounds are compared to the corresponding results in structure learning, and closely match them under mild conditions on the model parameters. Thus, our change detection bounds inherit partial tightness from the structure learning schemes in previous literature, demonstrating that in certain parameter regimes, the naive structure learning based approach to change detection is minimax optimal up to constant factors.
Tasks
Published	2017-10-28
URL	http://arxiv.org/abs/1710.10366v1
PDF	http://arxiv.org/pdf/1710.10366v1.pdf
PWC	https://paperswithcode.com/paper/lower-bounds-for-two-sample-structural-change
Repo
Framework

End-to-end 3D shape inverse rendering of different classes of objects from a single input image


Title	End-to-end 3D shape inverse rendering of different classes of objects from a single input image
Authors	Shima Kamyab, S. Zohreh Azimifar
Abstract	In this paper a semi-supervised deep framework is proposed for the problem of 3D shape inverse rendering from a single 2D input image. The main structure of proposed framework consists of unsupervised pre-trained components which significantly reduce the need to labeled data for training the whole framework. using labeled data has the advantage of achieving to accurate results without the need to predefined assumptions about image formation process. Three main components are used in the proposed network: an encoder which maps 2D input image to a representation space, a 3D decoder which decodes a representation to a 3D structure and a mapping component in order to map 2D to 3D representation. The only part that needs label for training is the mapping part with not too many parameters. The other components in the network can be pre-trained unsupervised using only 2D images or 3D data in each case. The way of reconstructing 3D shapes in the decoder component, inspired by the model based methods for 3D reconstruction, maps a low dimensional representation to 3D shape space with the advantage of extracting the basis vectors of shape space from training data itself and is not restricted to a small set of examples as used in predefined models. Therefore, the proposed framework deals directly with coordinate values of the point cloud representation which leads to achieve dense 3D shapes in the output. The experimental results on several benchmark datasets of objects and human faces and comparing with recent similar methods shows the power of proposed network in recovering more details from single 2D images.
Tasks	3D Reconstruction
Published	2017-11-11
URL	http://arxiv.org/abs/1711.05858v1
PDF	http://arxiv.org/pdf/1711.05858v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-3d-shape-inverse-rendering-of
Repo
Framework

Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning


Title	Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning
Authors	Hamid Mirzaei, Tony Givargis
Abstract	Recent advances in combining deep learning and Reinforcement Learning have shown a promising path for designing new control agents that can learn optimal policies for challenging control tasks. These new methods address the main limitations of conventional Reinforcement Learning methods such as customized feature engineering and small action/state space dimension requirements. In this paper, we leverage one of the state-of-the-art Reinforcement Learning methods, known as Trust Region Policy Optimization, to tackle intersection management for autonomous vehicles. We show that using this method, we can perform fine-grained acceleration control of autonomous vehicles in a grid street plan to achieve a global design objective.
Tasks	Autonomous Vehicles, Feature Engineering
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10432v1
PDF	http://arxiv.org/pdf/1705.10432v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-acceleration-control-for
Repo
Framework

On Fairness, Diversity and Randomness in Algorithmic Decision Making


Title	On Fairness, Diversity and Randomness in Algorithmic Decision Making
Authors	Nina Grgić-Hlača, Muhammad Bilal Zafar, Krishna P. Gummadi, Adrian Weller
Abstract	Consider a binary decision making process where a single machine learning classifier replaces a multitude of humans. We raise questions about the resulting loss of diversity in the decision making process. We study the potential benefits of using random classifier ensembles instead of a single classifier in the context of fairness-aware learning and demonstrate various attractive properties: (i) an ensemble of fair classifiers is guaranteed to be fair, for several different measures of fairness, (ii) an ensemble of unfair classifiers can still achieve fair outcomes, and (iii) an ensemble of classifiers can achieve better accuracy-fairness trade-offs than a single classifier. Finally, we introduce notions of distributional fairness to characterize further potential benefits of random classifier ensembles.
Tasks	Decision Making
Published	2017-06-30
URL	http://arxiv.org/abs/1706.10208v1
PDF	http://arxiv.org/pdf/1706.10208v1.pdf
PWC	https://paperswithcode.com/paper/on-fairness-diversity-and-randomness-in
Repo
Framework

Self-Supervised Learning for Stereo Matching with Self-Improving Ability


Title	Self-Supervised Learning for Stereo Matching with Self-Improving Ability
Authors	Yiran Zhong, Yuchao Dai, Hongdong Li
Abstract	Exiting deep-learning based dense stereo matching methods often rely on ground-truth disparity maps as the training signals, which are however not always available in many situations. In this paper, we design a simple convolutional neural network architecture that is able to learn to compute dense disparity maps directly from the stereo inputs. Training is performed in an end-to-end fashion without the need of ground-truth disparity maps. The idea is to use image warping error (instead of disparity-map residuals) as the loss function to drive the learning process, aiming to find a depth-map that minimizes the warping error. While this is a simple concept well-known in stereo matching, to make it work in a deep-learning framework, many non-trivial challenges must be overcome, and in this work we provide effective solutions. Our network is self-adaptive to different unseen imageries as well as to different camera settings. Experiments on KITTI and Middlebury stereo benchmark datasets show that our method outperforms many state-of-the-art stereo matching methods with a margin, and at the same time significantly faster.
Tasks	Stereo Matching, Stereo Matching Hand
Published	2017-09-04
URL	http://arxiv.org/abs/1709.00930v1
PDF	http://arxiv.org/pdf/1709.00930v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-learning-for-stereo-matching
Repo
Framework

Detecting Semantic Parts on Partially Occluded Objects


Title	Detecting Semantic Parts on Partially Occluded Objects
Authors	Jianyu Wang, Cihang Xie, Zhishuai Zhang, Jun Zhu, Lingxi Xie, Alan Yuille
Abstract	In this paper, we address the task of detecting semantic parts on partially occluded objects. We consider a scenario where the model is trained using non-occluded images but tested on occluded images. The motivation is that there are infinite number of occlusion patterns in real world, which cannot be fully covered in the training data. So the models should be inherently robust and adaptive to occlusions instead of fitting / learning the occlusion patterns in the training data. Our approach detects semantic parts by accumulating the confidence of local visual cues. Specifically, the method uses a simple voting method, based on log-likelihood ratio tests and spatial constraints, to combine the evidence of local cues. These cues are called visual concepts, which are derived by clustering the internal states of deep networks. We evaluate our voting scheme on the VehicleSemanticPart dataset with dense part annotations. We randomly place two, three or four irrelevant objects onto the target object to generate testing images with various occlusions. Experiments show that our algorithm outperforms several competitors in semantic part detection when occlusions are present.
Tasks
Published	2017-07-25
URL	http://arxiv.org/abs/1707.07819v1
PDF	http://arxiv.org/pdf/1707.07819v1.pdf
PWC	https://paperswithcode.com/paper/detecting-semantic-parts-on-partially
Repo
Framework

Density-Wise Two Stage Mammogram Classification using Texture Exploiting Descriptors


Title	Density-Wise Two Stage Mammogram Classification using Texture Exploiting Descriptors
Authors	Aditya A. Shastri, Deepti Tamrakar, Kapil Ahuja
Abstract	Breast cancer is becoming pervasive with each passing day. Hence, its early detection is a big step in saving the life of any patient. Mammography is a common tool in breast cancer diagnosis. The most important step here is classification of mammogram patches as normal-abnormal and benign-malignant. Texture of a breast in a mammogram patch plays a significant role in these classifications. We propose a variation of Histogram of Gradients (HOG) and Gabor filter combination called Histogram of Oriented Texture (HOT) that exploits this fact. We also revisit the Pass Band - Discrete Cosine Transform (PB-DCT) descriptor that captures texture information well. All features of a mammogram patch may not be useful. Hence, we apply a feature selection technique called Discrimination Potentiality (DP). Our resulting descriptors, DP-HOT and DP-PB-DCT, are compared with the standard descriptors. Density of a mammogram patch is important for classification, and has not been studied exhaustively. The Image Retrieval in Medical Application (IRMA) database from RWTH Aachen, Germany is a standard database that provides mammogram patches, and most researchers have tested their frameworks only on a subset of patches from this database. We apply our two new descriptors on all images of the IRMA database for density wise classification, and compare with the standard descriptors. We achieve higher accuracy than all of the existing standard descriptors (more than 92%).
Tasks	Feature Selection, Image Retrieval
Published	2017-01-15
URL	http://arxiv.org/abs/1701.04010v4
PDF	http://arxiv.org/pdf/1701.04010v4.pdf
PWC	https://paperswithcode.com/paper/density-wise-two-stage-mammogram
Repo
Framework

Deep Learning Features at Scale for Visual Place Recognition


Title	Deep Learning Features at Scale for Visual Place Recognition
Authors	Zetao Chen, Adam Jacobson, Niko Sunderhauf, Ben Upcroft, Lingqiao Liu, Chunhua Shen, Ian Reid, Michael Milford
Abstract	The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks. In this paper, we train, at large scale, two CNN architectures for the specific place recognition task and employ a multi-scale feature encoding method to generate condition- and viewpoint-invariant features. To enable this training to occur, we have developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of place appearance change at thousands of different places, as opposed to the semantic place type datasets currently available. This new dataset enables us to set up a training regime that interprets place recognition as a classification problem. We comprehensively evaluate our trained networks on several challenging benchmark place recognition datasets and demonstrate that they achieve an average 10% increase in performance over other place recognition algorithms and pre-trained CNNs. By analyzing the network responses and their differences from pre-trained networks, we provide insights into what a network learns when training for place recognition, and what these results signify for future research in this area.
Tasks	Visual Place Recognition
Published	2017-01-18
URL	http://arxiv.org/abs/1701.05105v1
PDF	http://arxiv.org/pdf/1701.05105v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-features-at-scale-for-visual
Repo
Framework

End-to-end Lung Nodule Detection in Computed Tomography


Title	End-to-end Lung Nodule Detection in Computed Tomography
Authors	Dufan Wu, Kyungsang Kim, Bin Dong, Georges El Fakhri, Quanzheng Li
Abstract	Computer aided diagnostic (CAD) system is crucial for modern med-ical imaging. But almost all CAD systems operate on reconstructed images, which were optimized for radiologists. Computer vision can capture features that is subtle to human observers, so it is desirable to design a CAD system op-erating on the raw data. In this paper, we proposed a deep-neural-network-based detection system for lung nodule detection in computed tomography (CT). A primal-dual-type deep reconstruction network was applied first to convert the raw data to the image space, followed by a 3-dimensional convolutional neural network (3D-CNN) for the nodule detection. For efficient network training, the deep reconstruction network and the CNN detector was trained sequentially first, then followed by one epoch of end-to-end fine tuning. The method was evaluated on the Lung Image Database Consortium image collection (LIDC-IDRI) with simulated forward projections. With 144 multi-slice fanbeam pro-jections, the proposed end-to-end detector could achieve comparable sensitivity with the reference detector, which was trained and applied on the fully-sampled image data. It also demonstrated superior detection performance compared to detectors trained on the reconstructed images. The proposed method is general and could be expanded to most detection tasks in medical imaging.
Tasks	Computed Tomography (CT), Lung Nodule Detection
Published	2017-11-06
URL	http://arxiv.org/abs/1711.02074v2
PDF	http://arxiv.org/pdf/1711.02074v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-lung-nodule-detection-in-computed
Repo
Framework

Control of Gene Regulatory Networks with Noisy Measurements and Uncertain Inputs


Title	Control of Gene Regulatory Networks with Noisy Measurements and Uncertain Inputs
Authors	Mahdi Imani, Ulisses Braga-Neto
Abstract	This paper is concerned with the problem of stochastic control of gene regulatory networks (GRNs) observed indirectly through noisy measurements and with uncertainty in the intervention inputs. The partial observability of the gene states and uncertainty in the intervention process are accounted for by modeling GRNs using the partially-observed Boolean dynamical system (POBDS) signal model with noisy gene expression measurements. Obtaining the optimal infinite-horizon control strategy for this problem is not attainable in general, and we apply reinforcement learning and Gaussian process techniques to find a near-optimal solution. The POBDS is first transformed to a directly-observed Markov Decision Process in a continuous belief space, and the Gaussian process is used for modeling the cost function over the belief and intervention spaces. Reinforcement learning then is used to learn the cost function from the available gene expression data. In addition, we employ sparsification, which enables the control of large partially-observed GRNs. The performance of the resulting algorithm is studied through a comprehensive set of numerical experiments using synthetic gene expression data generated from a melanoma gene regulatory network.
Tasks
Published	2017-02-24
URL	http://arxiv.org/abs/1702.07652v1
PDF	http://arxiv.org/pdf/1702.07652v1.pdf
PWC	https://paperswithcode.com/paper/control-of-gene-regulatory-networks-with
Repo
Framework