Paper Group ANR 305
Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech. Geometric Enclosing Networks. Relative Depth Order Estimation Using Multi-scale Densely Connected Convolutional Networks. Stein Variational Gradient Descent as Gradient Flow. Analysing Data-To-Text Gener …
Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech
Title | Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech |
Authors | Michael Neumann, Ngoc Thang Vu |
Abstract | Speech emotion recognition is an important and challenging task in the realm of human-computer interaction. Prior work proposed a variety of models and feature sets for training a system. In this work, we conduct extensive experiments using an attentive convolutional neural network with multi-view learning objective function. We compare system performance using different lengths of the input signal, different types of acoustic features and different types of emotion speech (improvised/scripted). Our experimental results on the Interactive Emotional Motion Capture (IEMOCAP) database reveal that the recognition performance strongly depends on the type of speech data independent of the choice of input features. Furthermore, we achieved state-of-the-art results on the improvised speech data of IEMOCAP. |
Tasks | Emotion Recognition, Motion Capture, MULTI-VIEW LEARNING, Speech Emotion Recognition |
Published | 2017-06-02 |
URL | http://arxiv.org/abs/1706.00612v1 |
http://arxiv.org/pdf/1706.00612v1.pdf | |
PWC | https://paperswithcode.com/paper/attentive-convolutional-neural-network-based |
Repo | |
Framework | |
Geometric Enclosing Networks
Title | Geometric Enclosing Networks |
Authors | Trung Le, Hung Vu, Tu Dinh Nguyen, Dinh Phung |
Abstract | Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current state-of-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G\left(\bz\right) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. |
Tasks | |
Published | 2017-08-16 |
URL | http://arxiv.org/abs/1708.04733v2 |
http://arxiv.org/pdf/1708.04733v2.pdf | |
PWC | https://paperswithcode.com/paper/geometric-enclosing-networks |
Repo | |
Framework | |
Relative Depth Order Estimation Using Multi-scale Densely Connected Convolutional Networks
Title | Relative Depth Order Estimation Using Multi-scale Densely Connected Convolutional Networks |
Authors | Ruoxi Deng, Tianqi Zhao, Chunhua Shen, Shengjun Liu |
Abstract | We study the problem of estimating the relative depth order of point pairs in a monocular image. Recent advances mainly focus on using deep convolutional neural networks (DCNNs) to learn and infer the ordinal information from multiple contextual information of the points pair such as global scene context, local contextual information, and the locations. However, it remains unclear how much each context contributes to the task. To address this, we first examine the contribution of each context cue [1], [2] to the performance in the context of depth order estimation. We find out the local context surrounding the points pair contributes the most and the global scene context helps little. Based on the findings, we propose a simple method, using a multi-scale densely-connected network to tackle the task. Instead of learning the global structure, we dedicate to explore the local structure by learning to regress from regions of multiple sizes around the point pairs. Moreover, we use the recent densely connected network [3] to encourage substantial feature reuse as well as deepen our network to boost the performance. We show in experiments that the results of our approach is on par with or better than the state-of-the-art methods with the benefit of using only a small number of training data. |
Tasks | |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.08063v2 |
http://arxiv.org/pdf/1707.08063v2.pdf | |
PWC | https://paperswithcode.com/paper/relative-depth-order-estimation-using-multi |
Repo | |
Framework | |
Stein Variational Gradient Descent as Gradient Flow
Title | Stein Variational Gradient Descent as Gradient Flow |
Authors | Qiang Liu |
Abstract | Stein variational gradient descent (SVGD) is a deterministic sampling algorithm that iteratively transports a set of particles to approximate given distributions, based on an efficient gradient-based update that guarantees to optimally decrease the KL divergence within a function space. This paper develops the first theoretical analysis on SVGD, discussing its weak convergence properties and showing that its asymptotic behavior is captured by a gradient flow of the KL divergence functional under a new metric structure induced by Stein operator. We also provide a number of results on Stein operator and Stein’s identity using the notion of weak derivative, including a new proof of the distinguishability of Stein discrepancy under weak conditions. |
Tasks | |
Published | 2017-04-25 |
URL | http://arxiv.org/abs/1704.07520v2 |
http://arxiv.org/pdf/1704.07520v2.pdf | |
PWC | https://paperswithcode.com/paper/stein-variational-gradient-descent-as |
Repo | |
Framework | |
Analysing Data-To-Text Generation Benchmarks
Title | Analysing Data-To-Text Generation Benchmarks |
Authors | Laura Perez-Beltrachini, Claire Gardent |
Abstract | Recently, several data-sets associating data to text have been created to train data-to-text surface realisers. It is unclear however to what extent the surface realisation task exercised by these data-sets is linguistically challenging. Do these data-sets provide enough variety to encourage the development of generic, high-quality data-to-text surface realisers ? In this paper, we argue that these data-sets have important drawbacks. We back up our claim using statistics, metrics and manual evaluation. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text surface realisers. |
Tasks | Data-to-Text Generation, Text Generation |
Published | 2017-05-10 |
URL | http://arxiv.org/abs/1705.03802v1 |
http://arxiv.org/pdf/1705.03802v1.pdf | |
PWC | https://paperswithcode.com/paper/analysing-data-to-text-generation-benchmarks |
Repo | |
Framework | |
Lower Bounds for Two-Sample Structural Change Detection in Ising and Gaussian Models
Title | Lower Bounds for Two-Sample Structural Change Detection in Ising and Gaussian Models |
Authors | Aditya Gangrade, Bobak Nazer, Venkatesh Saligrama |
Abstract | The change detection problem is to determine if the Markov network structures of two Markov random fields differ from one another given two sets of samples drawn from the respective underlying distributions. We study the trade-off between the sample sizes and the reliability of change detection, measured as a minimax risk, for the important cases of the Ising models and the Gaussian Markov random fields restricted to the models which have network structures with $p$ nodes and degree at most $d$, and obtain information-theoretic lower bounds for reliable change detection over these models. We show that for the Ising model, $\Omega\left(\frac{d^2}{(\log d)^2}\log p\right)$ samples are required from each dataset to detect even the sparsest possible changes, and that for the Gaussian, $\Omega\left( \gamma^{-2} \log(p)\right)$ samples are required from each dataset to detect change, where $\gamma$ is the smallest ratio of off-diagonal to diagonal terms in the precision matrices of the distributions. These bounds are compared to the corresponding results in structure learning, and closely match them under mild conditions on the model parameters. Thus, our change detection bounds inherit partial tightness from the structure learning schemes in previous literature, demonstrating that in certain parameter regimes, the naive structure learning based approach to change detection is minimax optimal up to constant factors. |
Tasks | |
Published | 2017-10-28 |
URL | http://arxiv.org/abs/1710.10366v1 |
http://arxiv.org/pdf/1710.10366v1.pdf | |
PWC | https://paperswithcode.com/paper/lower-bounds-for-two-sample-structural-change |
Repo | |
Framework | |
End-to-end 3D shape inverse rendering of different classes of objects from a single input image
Title | End-to-end 3D shape inverse rendering of different classes of objects from a single input image |
Authors | Shima Kamyab, S. Zohreh Azimifar |
Abstract | In this paper a semi-supervised deep framework is proposed for the problem of 3D shape inverse rendering from a single 2D input image. The main structure of proposed framework consists of unsupervised pre-trained components which significantly reduce the need to labeled data for training the whole framework. using labeled data has the advantage of achieving to accurate results without the need to predefined assumptions about image formation process. Three main components are used in the proposed network: an encoder which maps 2D input image to a representation space, a 3D decoder which decodes a representation to a 3D structure and a mapping component in order to map 2D to 3D representation. The only part that needs label for training is the mapping part with not too many parameters. The other components in the network can be pre-trained unsupervised using only 2D images or 3D data in each case. The way of reconstructing 3D shapes in the decoder component, inspired by the model based methods for 3D reconstruction, maps a low dimensional representation to 3D shape space with the advantage of extracting the basis vectors of shape space from training data itself and is not restricted to a small set of examples as used in predefined models. Therefore, the proposed framework deals directly with coordinate values of the point cloud representation which leads to achieve dense 3D shapes in the output. The experimental results on several benchmark datasets of objects and human faces and comparing with recent similar methods shows the power of proposed network in recovering more details from single 2D images. |
Tasks | 3D Reconstruction |
Published | 2017-11-11 |
URL | http://arxiv.org/abs/1711.05858v1 |
http://arxiv.org/pdf/1711.05858v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-3d-shape-inverse-rendering-of |
Repo | |
Framework | |
Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning
Title | Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning |
Authors | Hamid Mirzaei, Tony Givargis |
Abstract | Recent advances in combining deep learning and Reinforcement Learning have shown a promising path for designing new control agents that can learn optimal policies for challenging control tasks. These new methods address the main limitations of conventional Reinforcement Learning methods such as customized feature engineering and small action/state space dimension requirements. In this paper, we leverage one of the state-of-the-art Reinforcement Learning methods, known as Trust Region Policy Optimization, to tackle intersection management for autonomous vehicles. We show that using this method, we can perform fine-grained acceleration control of autonomous vehicles in a grid street plan to achieve a global design objective. |
Tasks | Autonomous Vehicles, Feature Engineering |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10432v1 |
http://arxiv.org/pdf/1705.10432v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-acceleration-control-for |
Repo | |
Framework | |
On Fairness, Diversity and Randomness in Algorithmic Decision Making
Title | On Fairness, Diversity and Randomness in Algorithmic Decision Making |
Authors | Nina Grgić-Hlača, Muhammad Bilal Zafar, Krishna P. Gummadi, Adrian Weller |
Abstract | Consider a binary decision making process where a single machine learning classifier replaces a multitude of humans. We raise questions about the resulting loss of diversity in the decision making process. We study the potential benefits of using random classifier ensembles instead of a single classifier in the context of fairness-aware learning and demonstrate various attractive properties: (i) an ensemble of fair classifiers is guaranteed to be fair, for several different measures of fairness, (ii) an ensemble of unfair classifiers can still achieve fair outcomes, and (iii) an ensemble of classifiers can achieve better accuracy-fairness trade-offs than a single classifier. Finally, we introduce notions of distributional fairness to characterize further potential benefits of random classifier ensembles. |
Tasks | Decision Making |
Published | 2017-06-30 |
URL | http://arxiv.org/abs/1706.10208v1 |
http://arxiv.org/pdf/1706.10208v1.pdf | |
PWC | https://paperswithcode.com/paper/on-fairness-diversity-and-randomness-in |
Repo | |
Framework | |
Self-Supervised Learning for Stereo Matching with Self-Improving Ability
Title | Self-Supervised Learning for Stereo Matching with Self-Improving Ability |
Authors | Yiran Zhong, Yuchao Dai, Hongdong Li |
Abstract | Exiting deep-learning based dense stereo matching methods often rely on ground-truth disparity maps as the training signals, which are however not always available in many situations. In this paper, we design a simple convolutional neural network architecture that is able to learn to compute dense disparity maps directly from the stereo inputs. Training is performed in an end-to-end fashion without the need of ground-truth disparity maps. The idea is to use image warping error (instead of disparity-map residuals) as the loss function to drive the learning process, aiming to find a depth-map that minimizes the warping error. While this is a simple concept well-known in stereo matching, to make it work in a deep-learning framework, many non-trivial challenges must be overcome, and in this work we provide effective solutions. Our network is self-adaptive to different unseen imageries as well as to different camera settings. Experiments on KITTI and Middlebury stereo benchmark datasets show that our method outperforms many state-of-the-art stereo matching methods with a margin, and at the same time significantly faster. |
Tasks | Stereo Matching, Stereo Matching Hand |
Published | 2017-09-04 |
URL | http://arxiv.org/abs/1709.00930v1 |
http://arxiv.org/pdf/1709.00930v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-learning-for-stereo-matching |
Repo | |
Framework | |
Detecting Semantic Parts on Partially Occluded Objects
Title | Detecting Semantic Parts on Partially Occluded Objects |
Authors | Jianyu Wang, Cihang Xie, Zhishuai Zhang, Jun Zhu, Lingxi Xie, Alan Yuille |
Abstract | In this paper, we address the task of detecting semantic parts on partially occluded objects. We consider a scenario where the model is trained using non-occluded images but tested on occluded images. The motivation is that there are infinite number of occlusion patterns in real world, which cannot be fully covered in the training data. So the models should be inherently robust and adaptive to occlusions instead of fitting / learning the occlusion patterns in the training data. Our approach detects semantic parts by accumulating the confidence of local visual cues. Specifically, the method uses a simple voting method, based on log-likelihood ratio tests and spatial constraints, to combine the evidence of local cues. These cues are called visual concepts, which are derived by clustering the internal states of deep networks. We evaluate our voting scheme on the VehicleSemanticPart dataset with dense part annotations. We randomly place two, three or four irrelevant objects onto the target object to generate testing images with various occlusions. Experiments show that our algorithm outperforms several competitors in semantic part detection when occlusions are present. |
Tasks | |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.07819v1 |
http://arxiv.org/pdf/1707.07819v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-semantic-parts-on-partially |
Repo | |
Framework | |
Density-Wise Two Stage Mammogram Classification using Texture Exploiting Descriptors
Title | Density-Wise Two Stage Mammogram Classification using Texture Exploiting Descriptors |
Authors | Aditya A. Shastri, Deepti Tamrakar, Kapil Ahuja |
Abstract | Breast cancer is becoming pervasive with each passing day. Hence, its early detection is a big step in saving the life of any patient. Mammography is a common tool in breast cancer diagnosis. The most important step here is classification of mammogram patches as normal-abnormal and benign-malignant. Texture of a breast in a mammogram patch plays a significant role in these classifications. We propose a variation of Histogram of Gradients (HOG) and Gabor filter combination called Histogram of Oriented Texture (HOT) that exploits this fact. We also revisit the Pass Band - Discrete Cosine Transform (PB-DCT) descriptor that captures texture information well. All features of a mammogram patch may not be useful. Hence, we apply a feature selection technique called Discrimination Potentiality (DP). Our resulting descriptors, DP-HOT and DP-PB-DCT, are compared with the standard descriptors. Density of a mammogram patch is important for classification, and has not been studied exhaustively. The Image Retrieval in Medical Application (IRMA) database from RWTH Aachen, Germany is a standard database that provides mammogram patches, and most researchers have tested their frameworks only on a subset of patches from this database. We apply our two new descriptors on all images of the IRMA database for density wise classification, and compare with the standard descriptors. We achieve higher accuracy than all of the existing standard descriptors (more than 92%). |
Tasks | Feature Selection, Image Retrieval |
Published | 2017-01-15 |
URL | http://arxiv.org/abs/1701.04010v4 |
http://arxiv.org/pdf/1701.04010v4.pdf | |
PWC | https://paperswithcode.com/paper/density-wise-two-stage-mammogram |
Repo | |
Framework | |
Deep Learning Features at Scale for Visual Place Recognition
Title | Deep Learning Features at Scale for Visual Place Recognition |
Authors | Zetao Chen, Adam Jacobson, Niko Sunderhauf, Ben Upcroft, Lingqiao Liu, Chunhua Shen, Ian Reid, Michael Milford |
Abstract | The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks. In this paper, we train, at large scale, two CNN architectures for the specific place recognition task and employ a multi-scale feature encoding method to generate condition- and viewpoint-invariant features. To enable this training to occur, we have developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of place appearance change at thousands of different places, as opposed to the semantic place type datasets currently available. This new dataset enables us to set up a training regime that interprets place recognition as a classification problem. We comprehensively evaluate our trained networks on several challenging benchmark place recognition datasets and demonstrate that they achieve an average 10% increase in performance over other place recognition algorithms and pre-trained CNNs. By analyzing the network responses and their differences from pre-trained networks, we provide insights into what a network learns when training for place recognition, and what these results signify for future research in this area. |
Tasks | Visual Place Recognition |
Published | 2017-01-18 |
URL | http://arxiv.org/abs/1701.05105v1 |
http://arxiv.org/pdf/1701.05105v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-features-at-scale-for-visual |
Repo | |
Framework | |
End-to-end Lung Nodule Detection in Computed Tomography
Title | End-to-end Lung Nodule Detection in Computed Tomography |
Authors | Dufan Wu, Kyungsang Kim, Bin Dong, Georges El Fakhri, Quanzheng Li |
Abstract | Computer aided diagnostic (CAD) system is crucial for modern med-ical imaging. But almost all CAD systems operate on reconstructed images, which were optimized for radiologists. Computer vision can capture features that is subtle to human observers, so it is desirable to design a CAD system op-erating on the raw data. In this paper, we proposed a deep-neural-network-based detection system for lung nodule detection in computed tomography (CT). A primal-dual-type deep reconstruction network was applied first to convert the raw data to the image space, followed by a 3-dimensional convolutional neural network (3D-CNN) for the nodule detection. For efficient network training, the deep reconstruction network and the CNN detector was trained sequentially first, then followed by one epoch of end-to-end fine tuning. The method was evaluated on the Lung Image Database Consortium image collection (LIDC-IDRI) with simulated forward projections. With 144 multi-slice fanbeam pro-jections, the proposed end-to-end detector could achieve comparable sensitivity with the reference detector, which was trained and applied on the fully-sampled image data. It also demonstrated superior detection performance compared to detectors trained on the reconstructed images. The proposed method is general and could be expanded to most detection tasks in medical imaging. |
Tasks | Computed Tomography (CT), Lung Nodule Detection |
Published | 2017-11-06 |
URL | http://arxiv.org/abs/1711.02074v2 |
http://arxiv.org/pdf/1711.02074v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-lung-nodule-detection-in-computed |
Repo | |
Framework | |
Control of Gene Regulatory Networks with Noisy Measurements and Uncertain Inputs
Title | Control of Gene Regulatory Networks with Noisy Measurements and Uncertain Inputs |
Authors | Mahdi Imani, Ulisses Braga-Neto |
Abstract | This paper is concerned with the problem of stochastic control of gene regulatory networks (GRNs) observed indirectly through noisy measurements and with uncertainty in the intervention inputs. The partial observability of the gene states and uncertainty in the intervention process are accounted for by modeling GRNs using the partially-observed Boolean dynamical system (POBDS) signal model with noisy gene expression measurements. Obtaining the optimal infinite-horizon control strategy for this problem is not attainable in general, and we apply reinforcement learning and Gaussian process techniques to find a near-optimal solution. The POBDS is first transformed to a directly-observed Markov Decision Process in a continuous belief space, and the Gaussian process is used for modeling the cost function over the belief and intervention spaces. Reinforcement learning then is used to learn the cost function from the available gene expression data. In addition, we employ sparsification, which enables the control of large partially-observed GRNs. The performance of the resulting algorithm is studied through a comprehensive set of numerical experiments using synthetic gene expression data generated from a melanoma gene regulatory network. |
Tasks | |
Published | 2017-02-24 |
URL | http://arxiv.org/abs/1702.07652v1 |
http://arxiv.org/pdf/1702.07652v1.pdf | |
PWC | https://paperswithcode.com/paper/control-of-gene-regulatory-networks-with |
Repo | |
Framework | |