July 27, 2019

3133 words 15 mins read

Paper Group ANR 737

Paper Group ANR 737

A projection pursuit framework for testing general high-dimensional hypothesis. Attention-Based End-to-End Speech Recognition on Voice Search. ResumeVis: A Visual Analytics System to Discover Semantic Information in Semi-structured Resume Data. An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing. Robust features f …

A projection pursuit framework for testing general high-dimensional hypothesis

Title A projection pursuit framework for testing general high-dimensional hypothesis
Authors Yinchu Zhu, Jelena Bradic
Abstract This article develops a framework for testing general hypothesis in high-dimensional models where the number of variables may far exceed the number of observations. Existing literature has considered less than a handful of hypotheses, such as testing individual coordinates of the model parameter. However, the problem of testing general and complex hypotheses remains widely open. We propose a new inference method developed around the hypothesis adaptive projection pursuit framework, which solves the testing problems in the most general case. The proposed inference is centered around a new class of estimators defined as $l_1$ projection of the initial guess of the unknown onto the space defined by the null. This projection automatically takes into account the structure of the null hypothesis and allows us to study formal inference for a number of long-standing problems. For example, we can directly conduct inference on the sparsity level of the model parameters and the minimum signal strength. This is especially significant given the fact that the former is a fundamental condition underlying most of the theoretical development in high-dimensional statistics, while the latter is a key condition used to establish variable selection properties. Moreover, the proposed method is asymptotically exact and has satisfactory power properties for testing very general functionals of the high-dimensional parameters. The simulation studies lend further support to our theoretical claims and additionally show excellent finite-sample size and power properties of the proposed test.
Tasks
Published 2017-05-02
URL http://arxiv.org/abs/1705.01024v1
PDF http://arxiv.org/pdf/1705.01024v1.pdf
PWC https://paperswithcode.com/paper/a-projection-pursuit-framework-for-testing
Repo
Framework
Title Attention-Based End-to-End Speech Recognition on Voice Search
Authors Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie
Abstract Recently, there has been a growing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. In this paper, we explore the use of attention-based encoder-decoder model for Mandarin speech recognition on a voice search task. Previous attempts have shown that applying attention-based encoder-decoder to Mandarin speech recognition was quite difficult due to the logographic orthography of Mandarin, the large vocabulary and the conditional dependency of the attention model. In this paper, we use character embedding to deal with the large vocabulary. Several tricks are used for effective model training, including L2 regularization, Gaussian weight noise and frame skipping. We compare two attention mechanisms and use attention smoothing to cover long context in the attention model. Taken together, these tricks allow us to finally achieve a character error rate (CER) of 3.58% and a sentence error rate (SER) of 7.43% on the MiTV voice search dataset. While together with a trigram language model, CER and SER reach 2.81% and 5.77%, respectively.
Tasks End-To-End Speech Recognition, L2 Regularization, Language Modelling, Speech Recognition
Published 2017-07-22
URL http://arxiv.org/abs/1707.07167v3
PDF http://arxiv.org/pdf/1707.07167v3.pdf
PWC https://paperswithcode.com/paper/attention-based-end-to-end-speech-recognition
Repo
Framework

ResumeVis: A Visual Analytics System to Discover Semantic Information in Semi-structured Resume Data

Title ResumeVis: A Visual Analytics System to Discover Semantic Information in Semi-structured Resume Data
Authors Chen Zhang, Hao Wang, Yingcai Wu
Abstract Massive public resume data emerging on the WWW indicates individual-related characteristics in terms of profile and career experiences. Resume Analysis (RA) provides opportunities for many applications, such as talent seeking and evaluation. Existing RA studies based on statistical analyzing have primarily focused on talent recruitment by identifying explicit attributes. However, they failed to discover the implicit semantic information, i.e., individual career progress patterns and social-relations, which are vital to comprehensive understanding of career development. Besides, how to visualize them for better human cognition is also challenging. To tackle these issues, we propose a visual analytics system ResumeVis to mine and visualize resume data. Firstly, a text-mining based approach is presented to extract semantic information. Then, a set of visualizations are devised to represent the semantic information in multiple perspectives. By interactive exploration on ResumeVis performed by domain experts, the following tasks can be accomplished: to trace individual career evolving trajectory; to mine latent social-relations among individuals; and to hold the full picture of massive resumes’ collective mobility. Case studies with over 2500 online officer resumes demonstrate the effectiveness of our system. We provide a demonstration video.
Tasks
Published 2017-05-15
URL http://arxiv.org/abs/1705.05206v1
PDF http://arxiv.org/pdf/1705.05206v1.pdf
PWC https://paperswithcode.com/paper/resumevis-a-visual-analytics-system-to
Repo
Framework

An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing

Title An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing
Authors Marcin Junczys-Dowmunt, Roman Grundkiewicz
Abstract In this work, we explore multiple neural architectures adapted for the task of automatic post-editing of machine translation output. We focus on neural end-to-end models that combine both inputs $mt$ (raw MT output) and $src$ (source language input) in a single neural architecture, modeling ${mt, src} \rightarrow pe$ directly. Apart from that, we investigate the influence of hard-attention models which seem to be well-suited for monolingual tasks, as well as combinations of both ideas. We report results on data sets provided during the WMT-2016 shared task on automatic post-editing and can demonstrate that dual-attention models that incorporate all available data in the APE scenario in a single model improve on the best shared task system and on all other published results after the shared task. Dual-attention models that are combined with hard attention remain competitive despite applying fewer changes to the input.
Tasks Automatic Post-Editing, Machine Translation
Published 2017-06-13
URL http://arxiv.org/abs/1706.04138v2
PDF http://arxiv.org/pdf/1706.04138v2.pdf
PWC https://paperswithcode.com/paper/an-exploration-of-neural-sequence-to-sequence
Repo
Framework

Robust features for facial action recognition

Title Robust features for facial action recognition
Authors Nadav Israel, Lior Wolf, Ran Barzilay, Gal Shoval
Abstract Automatic recognition of facial gestures is becoming increasingly important as real world AI agents become a reality. In this paper, we present an automated system that recognizes facial gestures by capturing local changes and encoding the motion into a histogram of frequencies. We evaluate the proposed method by demonstrating its effectiveness on spontaneous face action benchmarks: the FEEDTUM dataset, the Pain dataset and the HMDB51 dataset. The results show that, compared to known methods, the new encoding methods significantly improve the recognition accuracy and the robustness of analysis for a variety of applications.
Tasks Temporal Action Localization
Published 2017-02-05
URL http://arxiv.org/abs/1702.01426v2
PDF http://arxiv.org/pdf/1702.01426v2.pdf
PWC https://paperswithcode.com/paper/robust-features-for-facial-action-recognition
Repo
Framework

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Title Characterization of Gradient Dominance and Regularity Conditions for Neural Networks
Authors Yi Zhou, Yingbin Liang
Abstract The past decade has witnessed a successful application of deep learning to solving many challenging problems in machine learning and artificial intelligence. However, the loss functions of deep neural networks (especially nonlinear networks) are still far from being well understood from a theoretical aspect. In this paper, we enrich the current understanding of the landscape of the square loss functions for three types of neural networks. Specifically, when the parameter matrices are square, we provide an explicit characterization of the global minimizers for linear networks, linear residual networks, and nonlinear networks with one hidden layer. Then, we establish two quadratic types of landscape properties for the square loss of these neural networks, i.e., the gradient dominance condition within the neighborhood of their full rank global minimizers, and the regularity condition along certain directions and within the neighborhood of their global minimizers. These two landscape properties are desirable for the optimization around the global minimizers of the loss function for these neural networks.
Tasks
Published 2017-10-18
URL http://arxiv.org/abs/1710.06910v2
PDF http://arxiv.org/pdf/1710.06910v2.pdf
PWC https://paperswithcode.com/paper/characterization-of-gradient-dominance-and
Repo
Framework

Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks

Title Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks
Authors Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Qiushi Huang, Jintao Li, Tao Mei
Abstract Prediction of popularity has profound impact for social media, since it offers opportunities to reveal individual preference and public attention from evolutionary social systems. Previous research, although achieves promising results, neglects one distinctive characteristic of social data, i.e., sequentiality. For example, the popularity of online content is generated over time with sequential post streams of social media. To investigate the sequential prediction of popularity, we propose a novel prediction framework called Deep Temporal Context Networks (DTCN) by incorporating both temporal context and temporal attention into account. Our DTCN contains three main components, from embedding, learning to predicting. With a joint embedding network, we obtain a unified deep representation of multi-modal user-post data in a common embedding space. Then, based on the embedded data sequence over time, temporal context learning attempts to recurrently learn two adaptive temporal contexts for sequential popularity. Finally, a novel temporal attention is designed to predict new popularity (the popularity of a new user-post pair) with temporal coherence across multiple time-scales. Experiments on our released image dataset with about 600K Flickr photos demonstrate that DTCN outperforms state-of-the-art deep prediction algorithms, with an average of 21.51% relative performance improvement in the popularity prediction (Spearman Ranking Correlation).
Tasks
Published 2017-12-12
URL http://arxiv.org/abs/1712.04443v1
PDF http://arxiv.org/pdf/1712.04443v1.pdf
PWC https://paperswithcode.com/paper/sequential-prediction-of-social-media
Repo
Framework

Fast Predictive Multimodal Image Registration

Title Fast Predictive Multimodal Image Registration
Authors Xiao Yang, Roland Kwitt, Martin Styner, Marc Niethammer
Abstract We introduce a deep encoder-decoder architecture for image deformation prediction from multimodal images. Specifically, we design an image-patch-based deep network that jointly (i) learns an image similarity measure and (ii) the relationship between image patches and deformation parameters. While our method can be applied to general image registration formulations, we focus on the Large Deformation Diffeomorphic Metric Mapping (LDDMM) registration model. By predicting the initial momentum of the shooting formulation of LDDMM, we preserve its mathematical properties and drastically reduce the computation time, compared to optimization-based approaches. Furthermore, we create a Bayesian probabilistic version of the network that allows evaluation of registration uncertainty via sampling of the network at test time. We evaluate our method on a 3D brain MRI dataset using both T1- and T2-weighted images. Our experiments show that our method generates accurate predictions and that learning the similarity measure leads to more consistent registrations than relying on generic multimodal image similarity measures, such as mutual information. Our approach is an order of magnitude faster than optimization-based LDDMM.
Tasks Image Registration
Published 2017-03-31
URL http://arxiv.org/abs/1703.10902v1
PDF http://arxiv.org/pdf/1703.10902v1.pdf
PWC https://paperswithcode.com/paper/fast-predictive-multimodal-image-registration
Repo
Framework

Deep-learning-based data page classification for holographic memory

Title Deep-learning-based data page classification for holographic memory
Authors Tomoyoshi Shimobaba, Naoki Kuwata, Mizuha Homma, Takayuki Takahashi, Yuki Nagahama, Marie Sano, Satoki Hasegawa, Ryuji Hirayama, Takashi Kakue, Atsushi Shiraki, Naoki Takada, Tomoyoshi Ito
Abstract We propose a deep-learning-based classification of data pages used in holographic memory. We numerically investigated the classification performance of a conventional multi-layer perceptron (MLP) and a deep neural network, under the condition that reconstructed page data are contaminated by some noise and are randomly laterally shifted. The MLP was found to have a classification accuracy of 91.58%, whereas the deep neural network was able to classify data pages at an accuracy of 99.98%. The accuracy of the deep neural network is two orders of magnitude better than the MLP.
Tasks
Published 2017-07-02
URL http://arxiv.org/abs/1707.00684v1
PDF http://arxiv.org/pdf/1707.00684v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-based-data-page-classification
Repo
Framework

Attend and Interact: Higher-Order Object Interactions for Video Understanding

Title Attend and Interact: Higher-Order Object Interactions for Video Understanding
Authors Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
Abstract Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation or pairwise object relationships. Furthermore, learning interactions across multiple objects in hundreds of frames for video is computationally infeasible and performance may suffer since a large combinatorial space has to be modeled. In this paper, we propose to efficiently learn higher-order interactions between arbitrary subgroups of objects for fine-grained video understanding. We demonstrate that modeling object interactions significantly improves accuracy for both action recognition and video captioning, while saving more than 3-times the computation over traditional pairwise relationships. The proposed method is validated on two large-scale datasets: Kinetics and ActivityNet Captions. Our SINet and SINet-Caption achieve state-of-the-art performances on both datasets even though the videos are sampled at a maximum of 1 FPS. To the best of our knowledge, this is the first work modeling object interactions on open domain large-scale video datasets, and we additionally model higher-order object interactions which improves the performance with low computational costs.
Tasks Action Classification, Temporal Action Localization, Video Captioning, Video Classification, Video Description, Video Understanding
Published 2017-11-16
URL http://arxiv.org/abs/1711.06330v2
PDF http://arxiv.org/pdf/1711.06330v2.pdf
PWC https://paperswithcode.com/paper/attend-and-interact-higher-order-object
Repo
Framework

Path Integral Networks: End-to-End Differentiable Optimal Control

Title Path Integral Networks: End-to-End Differentiable Optimal Control
Authors Masashi Okada, Luca Rigazio, Takenobu Aoshima
Abstract In this paper, we introduce Path Integral Networks (PI-Net), a recurrent network representation of the Path Integral optimal control algorithm. The network includes both system dynamics and cost models, used for optimal control based planning. PI-Net is fully differentiable, learning both dynamics and cost models end-to-end by back-propagation and stochastic gradient descent. Because of this, PI-Net can learn to plan. PI-Net has several advantages: it can generalize to unseen states thanks to planning, it can be applied to continuous control tasks, and it allows for a wide variety learning schemes, including imitation and reinforcement learning. Preliminary experiment results show that PI-Net, trained by imitation learning, can mimic control demonstrations for two simulated problems; a linear system and a pendulum swing-up problem. We also show that PI-Net is able to learn dynamics and cost models latent in the demonstrations.
Tasks Continuous Control, Imitation Learning
Published 2017-06-29
URL http://arxiv.org/abs/1706.09597v1
PDF http://arxiv.org/pdf/1706.09597v1.pdf
PWC https://paperswithcode.com/paper/path-integral-networks-end-to-end
Repo
Framework

Grounded Objects and Interactions for Video Captioning

Title Grounded Objects and Interactions for Video Captioning
Authors Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
Abstract We address the problem of video captioning by grounding language generation on object interactions in the video. Existing work mostly focuses on overall scene understanding with often limited or no emphasis on object interactions to address the problem of video understanding. In this paper, we propose SINet-Caption that learns to generate captions grounded over higher-order interactions between arbitrary groups of objects for fine-grained video understanding. We discuss the challenges and benefits of such an approach. We further demonstrate state-of-the-art results on the ActivityNet Captions dataset using our model, SINet-Caption based on this approach.
Tasks Scene Understanding, Text Generation, Video Captioning, Video Understanding
Published 2017-11-16
URL http://arxiv.org/abs/1711.06354v1
PDF http://arxiv.org/pdf/1711.06354v1.pdf
PWC https://paperswithcode.com/paper/grounded-objects-and-interactions-for-video
Repo
Framework

From Data to City Indicators: A Knowledge Graph for Supporting Automatic Generation of Dashboards

Title From Data to City Indicators: A Knowledge Graph for Supporting Automatic Generation of Dashboards
Authors Henrique Santos, Victor Dantas, Vasco Furtado, Paulo Pinheiro, Deborah L. McGuinness
Abstract In the context of Smart Cities, indicator definitions have been used to calculate values that enable the comparison among different cities. The calculation of an indicator values has challenges as the calculation may need to combine some aspects of quality while addressing different levels of abstraction. Knowledge graphs (KGs) have been used successfully to support flexible representation, which can support improved understanding and data analysis in similar settings. This paper presents an operational description for a city KG, an indicator ontology that support indicator discovery and data visualization and an application capable of performing metadata analysis to automatically build and display dashboards according to discovered indicators. We describe our implementation in an urban mobility setting.
Tasks Knowledge Graphs
Published 2017-04-06
URL http://arxiv.org/abs/1704.01946v1
PDF http://arxiv.org/pdf/1704.01946v1.pdf
PWC https://paperswithcode.com/paper/from-data-to-city-indicators-a-knowledge
Repo
Framework

SESA: Supervised Explicit Semantic Analysis

Title SESA: Supervised Explicit Semantic Analysis
Authors Dasha Bogdanova, Majid Yazdani
Abstract In recent years supervised representation learning has provided state of the art or close to the state of the art results in semantic analysis tasks including ranking and information retrieval. The core idea is to learn how to embed items into a latent space such that they optimize a supervised objective in that latent space. The dimensions of the latent space have no clear semantics, and this reduces the interpretability of the system. For example, in personalization models, it is hard to explain why a particular item is ranked high for a given user profile. We propose a novel model of representation learning called Supervised Explicit Semantic Analysis (SESA) that is trained in a supervised fashion to embed items to a set of dimensions with explicit semantics. The model learns to compare two objects by representing them in this explicit space, where each dimension corresponds to a concept from a knowledge base. This work extends Explicit Semantic Analysis (ESA) with a supervised model for ranking problems. We apply this model to the task of Job-Profile relevance in LinkedIn in which a set of skills defines our explicit dimensions of the space. Every profile and job are encoded to this set of skills their similarity is calculated in this space. We use RNNs to embed text input into this space. In addition to interpretability, our model makes use of the web-scale collaborative skills data that is provided by users for each LinkedIn profile. Our model provides state of the art result while it remains interpretable.
Tasks Information Retrieval, Representation Learning
Published 2017-08-10
URL http://arxiv.org/abs/1708.03246v1
PDF http://arxiv.org/pdf/1708.03246v1.pdf
PWC https://paperswithcode.com/paper/sesa-supervised-explicit-semantic-analysis
Repo
Framework

Linear classifier design under heteroscedasticity in Linear Discriminant Analysis

Title Linear classifier design under heteroscedasticity in Linear Discriminant Analysis
Authors Kojo Sarfo Gyamfi, James Brusey, Andrew Hunt, Elena Gaura
Abstract Under normality and homoscedasticity assumptions, Linear Discriminant Analysis (LDA) is known to be optimal in terms of minimising the Bayes error for binary classification. In the heteroscedastic case, LDA is not guaranteed to minimise this error. Assuming heteroscedasticity, we derive a linear classifier, the Gaussian Linear Discriminant (GLD), that directly minimises the Bayes error for binary classification. In addition, we also propose a local neighbourhood search (LNS) algorithm to obtain a more robust classifier if the data is known to have a non-normal distribution. We evaluate the proposed classifiers on two artificial and ten real-world datasets that cut across a wide range of application areas including handwriting recognition, medical diagnosis and remote sensing, and then compare our algorithm against existing LDA approaches and other linear classifiers. The GLD is shown to outperform the original LDA procedure in terms of the classification accuracy under heteroscedasticity. While it compares favourably with other existing heteroscedastic LDA approaches, the GLD requires as much as 60 times lower training time on some datasets. Our comparison with the support vector machine (SVM) also shows that, the GLD, together with the LNS, requires as much as 150 times lower training time to achieve an equivalent classification accuracy on some of the datasets. Thus, our algorithms can provide a cheap and reliable option for classification in a lot of expert systems.
Tasks Medical Diagnosis
Published 2017-03-24
URL http://arxiv.org/abs/1703.08434v1
PDF http://arxiv.org/pdf/1703.08434v1.pdf
PWC https://paperswithcode.com/paper/linear-classifier-design-under
Repo
Framework
comments powered by Disqus