Paper Group ANR 646
Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective. Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors. Improved Fourier Mellin Invariant for Robust Rotation Estimation with Omni-cameras. Conditional Recurrent Flow: Conditional Generation of Longitudinal Samples with Applicatio …
Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective
Title | Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective |
Authors | Suryansh Kumar, Anoop Cherian, Yuchao Dai, Hongdong Li |
Abstract | This paper addresses the task of dense non-rigid structure-from-motion (NRSfM) using multiple images. State-of-the-art methods to this problem are often hurdled by scalability, expensive computations, and noisy measurements. Further, recent methods to NRSfM usually either assume a small number of sparse feature points or ignore local non-linearities of shape deformations, and thus cannot reliably model complex non-rigid deformations. To address these issues, in this paper, we propose a new approach for dense NRSfM by modeling the problem on a Grassmann manifold. Specifically, we assume the complex non-rigid deformations lie on a union of local linear subspaces both spatially and temporally. This naturally allows for a compact representation of the complex non-rigid deformation over frames. We provide experimental results on several synthetic and real benchmark datasets. The procured results clearly demonstrate that our method, apart from being scalable and more accurate than state-of-the-art methods, is also more robust to noise and generalizes to highly non-linear deformations. |
Tasks | |
Published | 2018-03-01 |
URL | http://arxiv.org/abs/1803.00233v2 |
http://arxiv.org/pdf/1803.00233v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-dense-non-rigid-structure-from |
Repo | |
Framework | |
Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors
Title | Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors |
Authors | Arnab Poddar, Md Sahidullah, Goutam Saha |
Abstract | Automatic speaker verification (ASV) is the process to recognize persons using voice as biometric. The ASV systems show considerable recognition performance with sufficient amount of speech from matched condition. One of the crucial challenges of ASV technology is to improve recognition performance with speech segments of short duration. In short duration condition, the model parameters are not properly estimated due to inadequate speech information, and this results poor recognition accuracy even with the state-of-the-art i-vector based ASV system. We hypothesize that considering the estimation quality during recognition process would help to improve the ASV performance. This can be incorporated as a quality measure during fusion of ASV systems. This paper investigates a new quality measure for i-vector representation of speech utterances computed directly from Baum-Welch statistics. The proposed metric is subsequently used as quality measure during fusion of ASV systems. In experiments with the NIST SRE 2008 corpus, We have shown that inclusion of proposed quality metric exhibits considerable improvement in speaker verification performance. The results also indicate the potentiality of the proposed method in real-world scenario with short test utterances. |
Tasks | Speaker Verification |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00828v1 |
http://arxiv.org/pdf/1812.00828v1.pdf | |
PWC | https://paperswithcode.com/paper/novel-quality-metric-for-duration-variability |
Repo | |
Framework | |
Improved Fourier Mellin Invariant for Robust Rotation Estimation with Omni-cameras
Title | Improved Fourier Mellin Invariant for Robust Rotation Estimation with Omni-cameras |
Authors | Qingwen Xu, Arturo Gomez Chavez, Heiko Bülow, Andreas Birk, Sören Schwertfeger |
Abstract | Spectral methods such as the improved Fourier Mellin Invariant (iFMI) transform have proved faster, more robust and accurate than feature based methods on image registration. However, iFMI is restricted to work only when the camera moves in 2D space and has not been applied on omni-cameras images so far. In this work, we extend the iFMI method and apply a motion model to estimate an omni-camera’s pose when it moves in 3D space. This is particularly useful in field robotics applications to get a rapid and comprehensive view of unstructured environments, and to estimate robustly the robot pose. In the experiment section, we compared the extended iFMI method against ORB and AKAZE feature based approaches on three datasets showing different type of environments: office, lawn and urban scenery (MPI-omni dataset). The results show that our method boosts the accuracy of the robot pose estimation two to four times with respect to the feature registration techniques, while offering lower processing times. Furthermore, the iFMI approach presents the best performance against motion blur typically present in mobile robotics. |
Tasks | Image Registration, Pose Estimation |
Published | 2018-11-13 |
URL | https://arxiv.org/abs/1811.05306v4 |
https://arxiv.org/pdf/1811.05306v4.pdf | |
PWC | https://paperswithcode.com/paper/improved-fourier-mellin-invariant-for-robust |
Repo | |
Framework | |
Conditional Recurrent Flow: Conditional Generation of Longitudinal Samples with Applications to Neuroimaging
Title | Conditional Recurrent Flow: Conditional Generation of Longitudinal Samples with Applications to Neuroimaging |
Authors | Seong Jae Hwang, Zirui Tao, Won Hwa Kim, Vikas Singh |
Abstract | Generative models using neural network have opened a door to large-scale studies for various application domains, especially for studies that suffer from lack of real samples to obtain statistically robust inference. Typically, these generative models would train on existing data to learn the underlying distribution of the measurements (e.g., images) in latent spaces conditioned on covariates (e.g., image labels), and generate independent samples that are identically distributed in the latent space. Such models may work for cross-sectional studies, however, they are not suitable to generate data for longitudinal studies that focus on “progressive” behavior in a sequence of data. In practice, this is a quite common case in various neuroimaging studies whose goal is to characterize a trajectory of pathologies of a specific disease even from early stages. This may be too ambitious especially when the sample size is small (e.g., up to a few hundreds). Motivated from the setup above, we seek to develop a conditional generative model for longitudinal data generation by designing an invertable neural network. Inspired by recurrent nature of longitudinal data, we propose a novel neural network that incorporates recurrent subnetwork and context gating to include smooth transition in a sequence of generated data. Our model is validated on a video sequence dataset and a longitudinal AD dataset with various experimental settings for qualitative and quantitative evaluations of the generated samples. The results with the AD dataset captures AD specific group differences with sufficiently generated longitudinal samples that are consistent with existing literature, which implies a great potential to be applicable to other disease studies. |
Tasks | |
Published | 2018-11-24 |
URL | http://arxiv.org/abs/1811.09897v2 |
http://arxiv.org/pdf/1811.09897v2.pdf | |
PWC | https://paperswithcode.com/paper/conditional-recurrent-flow-conditional |
Repo | |
Framework | |
Phonetic-attention scoring for deep speaker features in speaker verification
Title | Phonetic-attention scoring for deep speaker features in speaker verification |
Authors | Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang |
Abstract | Recent studies have shown that frame-level deep speaker features can be derived from a deep neural network with the training target set to discriminate speakers by a short speech segment. By pooling the frame-level features, utterance-level representations, called d-vectors, can be derived and used in the automatic speaker verification (ASV) task. This simple average pooling, however, is inherently sensitive to the phonetic content of the utterance. An interesting idea borrowed from machine translation is the attention-based mechanism, where the contribution of an input word to the translation at a particular time is weighted by an attention score. This score reflects the relevance of the input word and the present translation. We can use the same idea to align utterances with different phonetic contents. This paper proposes a phonetic-attention scoring approach for d-vector systems. By this approach, an attention score is computed for each frame pair. This score reflects the similarity of the two frames in phonetic content, and is used to weigh the contribution of this frame pair in the utterance-based scoring. This new scoring approach emphasizes the frame pairs with similar phonetic contents, which essentially provides a soft alignment for utterances with any phonetic contents. Experimental results show that compared with the naive average pooling, this phonetic-attention scoring approach can deliver consistent performance improvement in ASV tasks of both text-dependent and text-independent. |
Tasks | Machine Translation, Speaker Verification |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03255v1 |
http://arxiv.org/pdf/1811.03255v1.pdf | |
PWC | https://paperswithcode.com/paper/phonetic-attention-scoring-for-deep-speaker |
Repo | |
Framework | |
About Nonstandard Neutrosophic Logic (Answers to Imamura ‘Note on the Definition of Neutrosophic Logic’)
Title | About Nonstandard Neutrosophic Logic (Answers to Imamura ‘Note on the Definition of Neutrosophic Logic’) |
Authors | Florentin Smarandache |
Abstract | In order to more accurately situate and fit the neutrosophic logic into the framework of nonstandard analysis, we present the neutrosophic inequalities, neutrosophic equality, neutrosophic infimum and supremum, neutrosophic standard intervals, including the cases when the neutrosophic logic standard and nonstandard components T, I, F get values outside of the classical real unit interval [0, 1], and a brief evolution of neutrosophic operators. The paper intends to answer Imamura criticism that we found benefic in better understanding the nonstandard neutrosophic logic, although the nonstandard neutrosophic logic was never used in practical applications. |
Tasks | |
Published | 2018-11-24 |
URL | http://arxiv.org/abs/1812.02534v2 |
http://arxiv.org/pdf/1812.02534v2.pdf | |
PWC | https://paperswithcode.com/paper/about-nonstandard-neutrosophic-logic-answers |
Repo | |
Framework | |
Deep Surface Light Fields
Title | Deep Surface Light Fields |
Authors | Anpei Chen, Minye Wu, Yingliang Zhang, Nianyi Li, Jie Lu, Shenghua Gao, Jingyi Yu |
Abstract | A surface light field represents the radiance of rays originating from any points on the surface in any directions. Traditional approaches require ultra-dense sampling to ensure the rendering quality. In this paper, we present a novel neural network based technique called deep surface light field or DSLF to use only moderate sampling for high fidelity rendering. DSLF automatically fills in the missing data by leveraging different sampling patterns across the vertices and at the same time eliminates redundancies due to the network’s prediction capability. For real data, we address the image registration problem as well as conduct texture-aware remeshing for aligning texture edges with vertices to avoid blurring. Comprehensive experiments show that DSLF can further achieve high data compression ratio while facilitating real-time rendering on the GPU. |
Tasks | Image Registration |
Published | 2018-10-15 |
URL | http://arxiv.org/abs/1810.06514v1 |
http://arxiv.org/pdf/1810.06514v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-surface-light-fields |
Repo | |
Framework | |
Towards a universal neural network encoder for time series
Title | Towards a universal neural network encoder for time series |
Authors | Joan Serrà, Santiago Pascual, Alexandros Karatzoglou |
Abstract | We study the use of a time series encoder to learn representations that are useful on data set types with which it has not been trained on. The encoder is formed of a convolutional neural network whose temporal output is summarized by a convolutional attention mechanism. This way, we obtain a compact, fixed-length representation from longer, variable-length time series. We evaluate the performance of the proposed approach on a well-known time series classification benchmark, considering full adaptation, partial adaptation, and no adaptation of the encoder to the new data type. Results show that such strategies are competitive with the state-of-the-art, often outperforming conceptually-matching approaches. Besides accuracy scores, the facility of adaptation and the efficiency of pre-trained encoders make them an appealing option for the processing of scarcely- or non-labeled time series. |
Tasks | Time Series, Time Series Classification |
Published | 2018-05-10 |
URL | http://arxiv.org/abs/1805.03908v1 |
http://arxiv.org/pdf/1805.03908v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-a-universal-neural-network-encoder |
Repo | |
Framework | |
IDEL: In-Database Entity Linking with Neural Embeddings
Title | IDEL: In-Database Entity Linking with Neural Embeddings |
Authors | Torsten Kilias, Alexander Löser, Felix A. Gers, Richard Koopmanschap, Ying Zhang, Martin Kersten |
Abstract | We present a novel architecture, In-Database Entity Linking (IDEL), in which we integrate the analytics-optimized RDBMS MonetDB with neural text mining abilities. Our system design abstracts core tasks of most neural entity linking systems for MonetDB. To the best of our knowledge, this is the first defacto implemented system integrating entity-linking in a database. We leverage the ability of MonetDB to support in-database-analytics with user defined functions (UDFs) implemented in Python. These functions call machine learning libraries for neural text mining, such as TensorFlow. The system achieves zero cost for data shipping and transformation by utilizing MonetDB’s ability to embed Python processes in the database kernel and exchange data in NumPy arrays. IDEL represents text and relational data in a joint vector space with neural embeddings and can compensate errors with ambiguous entity representations. For detecting matching entities, we propose a novel similarity function based on joint neural embeddings which are learned via minimizing pairwise contrastive ranking loss. This function utilizes a high dimensional index structures for fast retrieval of matching entities. Our first implementation and experiments using the WebNLG corpus show the effectiveness and the potentials of IDEL. |
Tasks | Entity Linking |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.04884v1 |
http://arxiv.org/pdf/1803.04884v1.pdf | |
PWC | https://paperswithcode.com/paper/idel-in-database-entity-linking-with-neural |
Repo | |
Framework | |
Fast, Parameter free Outlier Identification for Robust PCA
Title | Fast, Parameter free Outlier Identification for Robust PCA |
Authors | Vishnu Menon, Sheetal Kalyani |
Abstract | Robust PCA, the problem of PCA in the presence of outliers has been extensively investigated in the last few years. Here we focus on Robust PCA in the column sparse outlier model. The existing methods for column sparse outlier model assumes either the knowledge of the dimension of the lower dimensional subspace or the fraction of outliers in the system. However in many applications knowledge of these parameters is not available. Motivated by this we propose a parameter free outlier identification method for robust PCA which a) does not require the knowledge of outlier fraction, b) does not require the knowledge of the dimension of the underlying subspace, c) is computationally simple and fast. Further, analytical guarantees are derived for outlier identification and the performance of the algorithm is compared with the existing state of the art methods. |
Tasks | |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.04791v1 |
http://arxiv.org/pdf/1804.04791v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-parameter-free-outlier-identification |
Repo | |
Framework | |
Deep 2.5D Vehicle Classification with Sparse SfM Depth Prior for Automated Toll Systems
Title | Deep 2.5D Vehicle Classification with Sparse SfM Depth Prior for Automated Toll Systems |
Authors | Georg Waltner, Michael Maurer, Thomas Holzmann, Patrick Ruprecht, Michael Opitz, Horst Possegger, Friedrich Fraundorfer, Horst Bischof |
Abstract | Automated toll systems rely on proper classification of the passing vehicles. This is especially difficult when the images used for classification only cover parts of the vehicle. To obtain information about the whole vehicle. we reconstruct the vehicle as 3D object and exploit this additional information within a Convolutional Neural Network (CNN). However, when using deep networks for 3D object classification, large amounts of dense 3D models are required for good accuracy, which are often neither available nor feasible to process due to memory requirements. Therefore, in our method we reproject the 3D object onto the image plane using the reconstructed points, lines or both. We utilize this sparse depth prior within an auxiliary network branch that acts as a regularizer during training. We show that this auxiliary regularizer helps to improve accuracy compared to 2D classification on a real-world dataset. Furthermore due to the design of the network, at test time only the 2D camera images are required for classification which enables the usage in portable computer vision systems. |
Tasks | 3D Object Classification, Object Classification |
Published | 2018-05-09 |
URL | http://arxiv.org/abs/1805.03511v2 |
http://arxiv.org/pdf/1805.03511v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-25d-vehicle-classification-with-sparse |
Repo | |
Framework | |
Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer
Title | Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer |
Authors | Sudha Rao, Joel Tetreault |
Abstract | Style transfer is the task of automatically transforming a piece of text in one particular style into another. A major barrier to progress in this field has been a lack of training and evaluation datasets, as well as benchmarks and automatic metrics. In this work, we create the largest corpus for a particular stylistic transfer (formality) and show that techniques from the machine translation community can serve as strong baselines for future work. We also discuss challenges of using automatic metrics. |
Tasks | Machine Translation, Style Transfer |
Published | 2018-03-17 |
URL | http://arxiv.org/abs/1803.06535v2 |
http://arxiv.org/pdf/1803.06535v2.pdf | |
PWC | https://paperswithcode.com/paper/dear-sir-or-madam-may-i-introduce-the-gyafc |
Repo | |
Framework | |
Age Group Classification with Speech and Metadata Multimodality Fusion
Title | Age Group Classification with Speech and Metadata Multimodality Fusion |
Authors | Denys Katerenchuk |
Abstract | Children comprise a significant proportion of TV viewers and it is worthwhile to customize the experience for them. However, identifying who is a child in the audience can be a challenging task. Identifying gender and age from audio commands is a well-studied problem but is still very challenging to get good accuracy when the utterances are typically only a couple of seconds long. We present initial studies of a novel method which combines utterances with user metadata. In particular, we develop an ensemble of different machine learning techniques on different subsets of data to improve child detection. Our initial results show a 9.2% absolute improvement over the baseline, leading to a state-of-the-art performance. |
Tasks | |
Published | 2018-03-02 |
URL | http://arxiv.org/abs/1803.00721v1 |
http://arxiv.org/pdf/1803.00721v1.pdf | |
PWC | https://paperswithcode.com/paper/age-group-classification-with-speech-and |
Repo | |
Framework | |
Mean Reverting Portfolios via Penalized OU-Likelihood Estimation
Title | Mean Reverting Portfolios via Penalized OU-Likelihood Estimation |
Authors | Jize Zhang, Tim Leung, Aleksandr Y. Aravkin |
Abstract | We study an optimization-based approach to con- struct a mean-reverting portfolio of assets. Our objectives are threefold: (1) design a portfolio that is well-represented by an Ornstein-Uhlenbeck process with parameters estimated by maximum likelihood, (2) select portfolios with desirable characteristics of high mean reversion and low variance, and (3) select a parsimonious portfolio, i.e. find a small subset of a larger universe of assets that can be used for long and short positions. We present the full problem formulation, a specialized algorithm that exploits partial minimization, and numerical examples using both simulated and empirical price data. |
Tasks | |
Published | 2018-03-17 |
URL | http://arxiv.org/abs/1803.06460v1 |
http://arxiv.org/pdf/1803.06460v1.pdf | |
PWC | https://paperswithcode.com/paper/mean-reverting-portfolios-via-penalized-ou |
Repo | |
Framework | |
Dynamic Assortment Selection under the Nested Logit Models
Title | Dynamic Assortment Selection under the Nested Logit Models |
Authors | Xi Chen, Yining Wang, Yuan Zhou |
Abstract | We study a stylized dynamic assortment planning problem during a selling season of finite length $T$, by considering a nested multinomial logit model with $M$ nests and $N$ items per nest. Our policy simultaneously learns customers’ choice behavior and makes dynamic decisions on assortments based on the current knowledge. It achieves the regret at the order of $\tilde{O}(\sqrt{MNT}+MN^2)$, where $M$ is the number of nests and $N$ is the number of products in each nest. We further provide a lower bound result of $\Omega(\sqrt{MT})$, which shows the optimality of the upper bound when $T>M$ and $N$ is small. However, the $N^2$ term in the upper bound is not ideal for applications where $N$ is large as compared to $T$. To address this issue, we further generalize our first policy by introducing a discretization technique, which leads to a regret of $\tilde{O}(\sqrt{M}T^{2/3}+MNT^{1/3})$ with a specific choice of discretization granularity. It improves the previous regret bound whenever $N>T^{1/3}$. We provide numerical results to demonstrate the empirical performance of both proposed policies. |
Tasks | |
Published | 2018-06-27 |
URL | http://arxiv.org/abs/1806.10410v1 |
http://arxiv.org/pdf/1806.10410v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-assortment-selection-under-the-nested |
Repo | |
Framework | |