Paper Group ANR 503
On Support Relations and Semantic Scene Graphs. The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition. Matrix Variate RBM and Its Applications. Knowledge Distillation for Small-footprint Highway Networks. Bending the Curve: Improving the ROC Curve Through Error Redistribution. Copeland Dueling Bandit Problem: Regret Lower Bound, Opt …
On Support Relations and Semantic Scene Graphs
Title | On Support Relations and Semantic Scene Graphs |
Authors | Michael Ying Yang, Wentong Liao, Hanno Ackermann, Bodo Rosenhahn |
Abstract | Scene understanding is a popular and challenging topic in both computer vision and photogrammetry. Scene graph provides rich information for such scene understanding. This paper presents a novel approach to infer such relations and then to construct the scene graph. Support relations are estimated by considering important, previously ignored information: the physical stability and the prior support knowledge between object classes. In contrast to previous methods for extracting support relations, the proposed approach generates more accurate results, and does not require a pixel-wise semantic labeling of the scene. The semantic scene graph which describes all the contextual relations within the scene is constructed using this information. To evaluate the accuracy of these graphs, multiple different measures are formulated. The proposed algorithms are evaluated using the NYUv2 database. The results demonstrate that the inferred support relations are more precise than state-of-the-art. The scene graphs are compared against ground truth graphs. |
Tasks | Scene Understanding |
Published | 2016-09-19 |
URL | http://arxiv.org/abs/1609.05834v4 |
http://arxiv.org/pdf/1609.05834v4.pdf | |
PWC | https://paperswithcode.com/paper/on-support-relations-and-semantic-scene |
Repo | |
Framework | |
The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition
Title | The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition |
Authors | Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang |
Abstract | This paper describes the Arabic Multi-Genre Broadcast (MGB-2) Challenge for SLT-2016. Unlike last year’s English MGB Challenge, which focused on recognition of diverse TV genres, this year, the challenge has an emphasis on handling the diversity in dialect in Arabic speech. Audio data comes from 19 distinct programmes from the Aljazeera Arabic TV channel between March 2005 and December 2015. Programmes are split into three groups: conversations, interviews, and reports. A total of 1,200 hours have been released with lightly supervised transcriptions for the acoustic modelling. For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera.net for a 10 year duration 2000-2011. Two lexicons have been provided, one phoneme based and one grapheme based. Finally, two tasks were proposed for this year’s challenge: standard speech transcription, and word alignment. This paper describes the task data and evaluation process used in the MGB challenge, and summarises the results obtained. |
Tasks | Acoustic Modelling, Language Modelling, Word Alignment |
Published | 2016-09-19 |
URL | https://arxiv.org/abs/1609.05625v3 |
https://arxiv.org/pdf/1609.05625v3.pdf | |
PWC | https://paperswithcode.com/paper/the-mgb-2-challenge-arabic-multi-dialect |
Repo | |
Framework | |
Matrix Variate RBM and Its Applications
Title | Matrix Variate RBM and Its Applications |
Authors | Guanglei Qi, Yanfeng Sun, Junbin Gao, Yongli Hu, Jinghua Li |
Abstract | Restricted Boltzmann Machine (RBM) is an importan- t generative model modeling vectorial data. While applying an RBM in practice to images, the data have to be vec- torized. This results in high-dimensional data and valu- able spatial information has got lost in vectorization. In this paper, a Matrix-Variate Restricted Boltzmann Machine (MVRBM) model is proposed by generalizing the classic RBM to explicitly model matrix data. In the new RBM model, both input and hidden variables are in matrix forms which are connected by bilinear transforms. The MVRBM has much less model parameters, resulting in a faster train- ing algorithm while retaining comparable performance as the classic RBM. The advantages of the MVRBM have been demonstrated on two real-world applications: Image super- resolution and handwritten digit recognition. |
Tasks | Handwritten Digit Recognition, Image Super-Resolution, Super-Resolution |
Published | 2016-01-05 |
URL | http://arxiv.org/abs/1601.00722v1 |
http://arxiv.org/pdf/1601.00722v1.pdf | |
PWC | https://paperswithcode.com/paper/matrix-variate-rbm-and-its-applications |
Repo | |
Framework | |
Knowledge Distillation for Small-footprint Highway Networks
Title | Knowledge Distillation for Small-footprint Highway Networks |
Authors | Liang Lu, Michelle Guo, Steve Renals |
Abstract | Deep learning has significantly advanced state-of-the-art of speech recognition in the past few years. However, compared to conventional Gaussian mixture acoustic models, neural network models are usually much larger, and are therefore not very deployable in embedded devices. Previously, we investigated a compact highway deep neural network (HDNN) for acoustic modelling, which is a type of depth-gated feedforward neural network. We have shown that HDNN-based acoustic models can achieve comparable recognition accuracy with much smaller number of model parameters compared to plain deep neural network (DNN) acoustic models. In this paper, we push the boundary further by leveraging on the knowledge distillation technique that is also known as {\it teacher-student} training, i.e., we train the compact HDNN model with the supervision of a high accuracy cumbersome model. Furthermore, we also investigate sequence training and adaptation in the context of teacher-student training. Our experiments were performed on the AMI meeting speech recognition corpus. With this technique, we significantly improved the recognition accuracy of the HDNN acoustic model with less than 0.8 million parameters, and narrowed the gap between this model and the plain DNN with 30 million parameters. |
Tasks | Acoustic Modelling, Speech Recognition |
Published | 2016-08-02 |
URL | http://arxiv.org/abs/1608.00892v3 |
http://arxiv.org/pdf/1608.00892v3.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-distillation-for-small-footprint |
Repo | |
Framework | |
Bending the Curve: Improving the ROC Curve Through Error Redistribution
Title | Bending the Curve: Improving the ROC Curve Through Error Redistribution |
Authors | Oran Richman, Shie Mannor |
Abstract | Classification performance is often not uniform over the data. Some areas in the input space are easier to classify than others. Features that hold information about the “difficulty” of the data may be non-discriminative and are therefore disregarded in the classification process. We propose a meta-learning approach where performance may be improved by post-processing. This improvement is done by establishing a dynamic threshold on the base-classifier results. Since the base-classifier is treated as a “black box” the method presented can be used on any state of the art classifier in order to try an improve its performance. We focus our attention on how to better control the true-positive/false-positive trade-off known as the ROC curve. We propose an algorithm for the derivation of optimal thresholds by redistributing the error depending on features that hold information about difficulty. We demonstrate the resulting benefit on both synthetic and real-life data. |
Tasks | Meta-Learning |
Published | 2016-05-21 |
URL | http://arxiv.org/abs/1605.06652v1 |
http://arxiv.org/pdf/1605.06652v1.pdf | |
PWC | https://paperswithcode.com/paper/bending-the-curve-improving-the-roc-curve |
Repo | |
Framework | |
Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
Title | Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm |
Authors | Junpei Komiyama, Junya Honda, Hiroshi Nakagawa |
Abstract | We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. The hardness of recommending Copeland winners, the arms that beat the greatest number of other arms, is characterized by deriving an asymptotic regret bound. We propose Copeland Winners Relative Minimum Empirical Divergence (CW-RMED) and derive an asymptotically optimal regret bound for it. However, it is not known whether the algorithm can be efficiently computed or not. To address this issue, we devise an efficient version (ECW-RMED) and derive its asymptotic regret bound. Experimental comparisons of dueling bandit algorithms show that ECW-RMED significantly outperforms existing ones. |
Tasks | |
Published | 2016-05-05 |
URL | http://arxiv.org/abs/1605.01677v2 |
http://arxiv.org/pdf/1605.01677v2.pdf | |
PWC | https://paperswithcode.com/paper/copeland-dueling-bandit-problem-regret-lower |
Repo | |
Framework | |
Absolute Pose Estimation from Line Correspondences using Direct Linear Transformation
Title | Absolute Pose Estimation from Line Correspondences using Direct Linear Transformation |
Authors | Bronislav Přibyl, Pavel Zemčík, Martin Čadík |
Abstract | This work is concerned with camera pose estimation from correspondences of 3D/2D lines, i. e. with the Perspective-n-Line (PnL) problem. We focus on large line sets, which can be efficiently solved by methods using linear formulation of PnL. We propose a novel method “DLT-Combined-Lines” based on the Direct Linear Transformation (DLT) algorithm, which benefits from a new combination of two existing DLT methods for pose estimation. The method represents 2D structure by lines, and 3D structure by both points and lines. The redundant 3D information reduces the minimum required line correspondences to 5. A cornerstone of the method is a combined projection matri xestimated by the DLT algorithm. It contains multiple estimates of camera rotation and translation, which can be recovered after enforcing constraints of the matrix. Multiplicity of the estimates is exploited to improve the accuracy of the proposed method. For large line sets (10 and more), the method is comparable to the state-of-theart in accuracy of orientation estimation. It achieves state-of-the-art accuracy in estimation of camera position and it yields the smallest reprojection error under strong image noise. The method achieves top-3 results on real world data. The proposed method is also highly computationally effective, estimating the pose of 1000 lines in 12 ms on a desktop computer. |
Tasks | Pose Estimation |
Published | 2016-08-24 |
URL | http://arxiv.org/abs/1608.06891v2 |
http://arxiv.org/pdf/1608.06891v2.pdf | |
PWC | https://paperswithcode.com/paper/absolute-pose-estimation-from-line |
Repo | |
Framework | |
Learning scale-variant and scale-invariant features for deep image classification
Title | Learning scale-variant and scale-invariant features for deep image classification |
Authors | Nanne van Noord, Eric Postma |
Abstract | Convolutional Neural Networks (CNNs) require large image corpora to be trained on classification tasks. The variation in image resolutions, sizes of objects and patterns depicted, and image scales, hampers CNN training and performance, because the task-relevant information varies over spatial scales. Previous work attempting to deal with such scale variations focused on encouraging scale-invariant CNN representations. However, scale-invariant representations are incomplete representations of images, because images contain scale-variant information as well. This paper addresses the combined development of scale-invariant and scale-variant representations. We propose a multi- scale CNN method to encourage the recognition of both types of features and evaluate it on a challenging image classification task involving task-relevant characteristics at multiple scales. The results show that our multi-scale CNN outperforms single-scale CNN. This leads to the conclusion that encouraging the combined development of a scale-invariant and scale-variant representation in CNNs is beneficial to image recognition performance. |
Tasks | Image Classification |
Published | 2016-02-03 |
URL | http://arxiv.org/abs/1602.01255v2 |
http://arxiv.org/pdf/1602.01255v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-scale-variant-and-scale-invariant |
Repo | |
Framework | |
Assessing and tuning brain decoders: cross-validation, caveats, and guidelines
Title | Assessing and tuning brain decoders: cross-validation, caveats, and guidelines |
Authors | Gaël Varoquaux, Pradeep Reddy Raamana, Denis Engemann, Andrés Hoyos-Idrobo, Yannick Schwartz, Bertrand Thirion |
Abstract | Decoding, ie prediction from brain images or signals, calls for empirical evaluation of its predictive power. Such evaluation is achieved via cross-validation, a method also used to tune decoders’ hyper-parameters. This paper is a review on cross-validation procedures for decoding in neuroimaging. It includes a didactic overview of the relevant theoretical considerations. Practical aspects are highlighted with an extensive empirical study of the common decoders in within-and across-subject predictions, on multiple datasets –anatomical and functional MRI and MEG– and simulations. Theory and experiments outline that the popular " leave-one-out " strategy leads to unstable and biased estimates, and a repeated random splits method should be preferred. Experiments outline the large error bars of cross-validation in neuroimaging settings: typical confidence intervals of 10%. Nested cross-validation can tune decoders’ parameters while avoiding circularity bias. However we find that it can be more favorable to use sane defaults, in particular for non-sparse decoders. |
Tasks | |
Published | 2016-06-16 |
URL | http://arxiv.org/abs/1606.05201v2 |
http://arxiv.org/pdf/1606.05201v2.pdf | |
PWC | https://paperswithcode.com/paper/assessing-and-tuning-brain-decoders-cross |
Repo | |
Framework | |
An Interactive Segmentation Tool for Quantifying Fat in Lumbar Muscles using Axial Lumbar-Spine MRI
Title | An Interactive Segmentation Tool for Quantifying Fat in Lumbar Muscles using Axial Lumbar-Spine MRI |
Authors | Joseph Antony, Kevin McGuinness, Neil Welch, Joe Coyle, Andy Franklyn-Miller, Noel E. O’Connor, Kieran Moran |
Abstract | In this paper we present an interactive tool that can be used to quantify fat infiltration in lumbar muscles, which is useful in studying fat infiltration and lower back pain (LBP) in adults. Currently, a qualitative assessment by visual grading via a 5-point scale is used to study fat infiltration in lumbar muscles from an axial view of lumbar-spine MR Images. However, a quantitative approach (on a continuous scale of 0-100%) may provide a greater insight. In this paper, we propose a method to precisely quantify the fat deposition / infiltration in a user-defined region of the lumbar muscles, which may aid better diagnosis and analysis. The key steps are interactively segmenting the region of interest (ROI) from the lumbar muscles using the well known livewire technique, identifying fatty regions in the segmented region based on variable-selection of threshold and softness levels, automatically detecting the center of the spinal column and fragmenting the lumbar muscles into smaller regions with reference to the center of the spinal column, computing key parameters [such as total and region-wise fat content percentage, total-cross sectional area (TCSA) and functional cross-sectional area (FCSA)] and exporting the computations and associated patient information from the MRI, into a database. A standalone application using MATLAB R2014a was developed to perform the required computations along with an intuitive graphical user interface (GUI). |
Tasks | Interactive Segmentation |
Published | 2016-09-09 |
URL | http://arxiv.org/abs/1609.02744v1 |
http://arxiv.org/pdf/1609.02744v1.pdf | |
PWC | https://paperswithcode.com/paper/an-interactive-segmentation-tool-for |
Repo | |
Framework | |
Stochastic Patching Process
Title | Stochastic Patching Process |
Authors | Xuhui Fan, Bin Li, Yi Wang, Yang Wang, Fang Chen |
Abstract | Stochastic partition models tailor a product space into a number of rectangular regions such that the data within each region exhibit certain types of homogeneity. Due to constraints of partition strategy, existing models may cause unnecessary dissections in sparse regions when fitting data in dense regions. To alleviate this limitation, we propose a parsimonious partition model, named Stochastic Patching Process (SPP), to deal with multi-dimensional arrays. SPP adopts an “enclosing” strategy to attach rectangular patches to dense regions. SPP is self-consistent such that it can be extended to infinite arrays. We apply SPP to relational modeling and the experimental results validate its merit compared to the state-of-the-arts. |
Tasks | |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.06886v2 |
http://arxiv.org/pdf/1605.06886v2.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-patching-process |
Repo | |
Framework | |
Generalized support vector regression: duality and tensor-kernel representation
Title | Generalized support vector regression: duality and tensor-kernel representation |
Authors | Saverio Salzo, Johan A. K. Suykens |
Abstract | In this paper we study the variational problem associated to support vector regression in Banach function spaces. Using the Fenchel-Rockafellar duality theory, we give explicit formulation of the dual problem as well as of the related optimality conditions. Moreover, we provide a new computational framework for solving the problem which relies on a tensor-kernel representation. This analysis overcomes the typical difficulties connected to learning in Banach spaces. We finally present a large class of tensor-kernels to which our theory fully applies: power series tensor kernels. This type of kernels describe Banach spaces of analytic functions and include generalizations of the exponential and polynomial kernels as well as, in the complex case, generalizations of the Szeg"o and Bergman kernels. |
Tasks | |
Published | 2016-03-18 |
URL | http://arxiv.org/abs/1603.05876v2 |
http://arxiv.org/pdf/1603.05876v2.pdf | |
PWC | https://paperswithcode.com/paper/generalized-support-vector-regression-duality |
Repo | |
Framework | |
Regret Analysis of the Anytime Optimally Confident UCB Algorithm
Title | Regret Analysis of the Anytime Optimally Confident UCB Algorithm |
Authors | Tor Lattimore |
Abstract | I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise. The new algorithm is simple, intuitive (in hindsight) and comes with the strongest finite-time regret guarantees for a horizon-free algorithm so far. I also show a finite-time lower bound that nearly matches the upper bound. |
Tasks | |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08661v2 |
http://arxiv.org/pdf/1603.08661v2.pdf | |
PWC | https://paperswithcode.com/paper/regret-analysis-of-the-anytime-optimally |
Repo | |
Framework | |
EgoTransfer: Transferring Motion Across Egocentric and Exocentric Domains using Deep Neural Networks
Title | EgoTransfer: Transferring Motion Across Egocentric and Exocentric Domains using Deep Neural Networks |
Authors | Shervin Ardeshir, Krishna Regmi, Ali Borji |
Abstract | Mirror neurons have been observed in the primary motor cortex of primate species, in particular in humans and monkeys. A mirror neuron fires when a person performs a certain action, and also when he observes the same action being performed by another person. A crucial step towards building fully autonomous intelligent systems with human-like learning abilities is the capability in modeling the mirror neuron. On one hand, the abundance of egocentric cameras in the past few years has offered the opportunity to study a lot of vision problems from the first-person perspective. A great deal of interesting research has been done during the past few years, trying to explore various computer vision tasks from the perspective of the self. On the other hand, videos recorded by traditional static cameras, capture humans performing different actions from an exocentric third-person perspective. In this work, we take the first step towards relating motion information across these two perspectives. We train models that predict motion in an egocentric view, by observing it from an exocentric view, and vice versa. This allows models to predict how an egocentric motion would look like from outside. To do so, we train linear and nonlinear models and evaluate their performance in terms of retrieving the egocentric (exocentric) motion features, while having access to an exocentric (egocentric) motion feature. Our experimental results demonstrate that motion information can be successfully transferred across the two views. |
Tasks | |
Published | 2016-12-17 |
URL | http://arxiv.org/abs/1612.05836v1 |
http://arxiv.org/pdf/1612.05836v1.pdf | |
PWC | https://paperswithcode.com/paper/egotransfer-transferring-motion-across |
Repo | |
Framework | |
Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis
Title | Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis |
Authors | Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford |
Abstract | This paper considers the problem of canonical-correlation analysis (CCA) (Hotelling, 1936) and, more broadly, the generalized eigenvector problem for a pair of symmetric matrices. These are two fundamental problems in data analysis and scientific computing with numerous applications in machine learning and statistics (Shi and Malik, 2000; Hardoon et al., 2004; Witten et al., 2009). We provide simple iterative algorithms, with improved runtimes, for solving these problems that are globally linearly convergent with moderate dependencies on the condition numbers and eigenvalue gaps of the matrices involved. We obtain our results by reducing CCA to the top-$k$ generalized eigenvector problem. We solve this problem through a general framework that simply requires black box access to an approximate linear system solver. Instantiating this framework with accelerated gradient descent we obtain a running time of $O(\frac{z k \sqrt{\kappa}}{\rho} \log(1/\epsilon) \log \left(k\kappa/\rho\right))$ where $z$ is the total number of nonzero entries, $\kappa$ is the condition number and $\rho$ is the relative eigenvalue gap of the appropriate matrices. Our algorithm is linear in the input size and the number of components $k$ up to a $\log(k)$ factor. This is essential for handling large-scale matrices that appear in practice. To the best of our knowledge this is the first such algorithm with global linear convergence. We hope that our results prompt further research and ultimately improve the practical running time for performing these important data analysis procedures on large data sets. |
Tasks | |
Published | 2016-04-13 |
URL | http://arxiv.org/abs/1604.03930v2 |
http://arxiv.org/pdf/1604.03930v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-algorithms-for-large-scale |
Repo | |
Framework | |