July 29, 2019

2872 words 14 mins read

Paper Group ANR 127

Paper Group ANR 127

Analogical-based Bayesian Optimization. Report: Dynamic Eye Movement Matching and Visualization Tool in Neuro Gesture. Visual Question Generation as Dual Task of Visual Question Answering. Evidence for the size principle in semantic and perceptual domains. PANFIS++: A Generalized Approach to Evolving Learning. Pronunciation recognition of English p …

Analogical-based Bayesian Optimization

Title Analogical-based Bayesian Optimization
Authors Trung Le, Khanh Nguyen, Tu Dinh Nguyen, Dinh Phung
Abstract Some real-world problems revolve to solve the optimization problem \max_{x\in\mathcal{X}}f\left(x\right) where f\left(.\right) is a black-box function and X might be the set of non-vectorial objects (e.g., distributions) where we can only define a symmetric and non-negative similarity score on it. This setting requires a novel view for the standard framework of Bayesian Optimization that generalizes the core insightful spirit of this framework. With this spirit, in this paper, we propose Analogical-based Bayesian Optimization that can maximize black-box function over a domain where only a similarity score can be defined. Our pathway is as follows: we first base on the geometric view of Gaussian Processes (GP) to define the concept of influence level that allows us to analytically represent predictive means and variances of GP posteriors and base on that view to enable replacing kernel similarity by a more genetic similarity score. Furthermore, we also propose two strategies to find a batch of query points that can efficiently handle high dimensional data.
Tasks Gaussian Processes
Published 2017-09-19
URL http://arxiv.org/abs/1709.06390v1
PDF http://arxiv.org/pdf/1709.06390v1.pdf
PWC https://paperswithcode.com/paper/analogical-based-bayesian-optimization
Repo
Framework

Report: Dynamic Eye Movement Matching and Visualization Tool in Neuro Gesture

Title Report: Dynamic Eye Movement Matching and Visualization Tool in Neuro Gesture
Authors Qiangeng Xu, John Kender
Abstract In the research of the impact of gestures using by a lecturer, one challenging task is to infer the attention of a group of audiences. Two important measurements that can help infer the level of attention are eye movement data and Electroencephalography (EEG) data. Under the fundamental assumption that a group of people would look at the same place if they all pay attention at the same time, we apply a method, “Time Warp Edit Distance”, to calculate the similarity of their eye movement trajectories. Moreover, we also cluster eye movement pattern of audiences based on these pair-wised similarity metrics. Besides, since we don’t have a direct metric for the “attention” ground truth, a visual assessment would be beneficial to evaluate the gesture-attention relationship. Thus we also implement a visualization tool.
Tasks EEG
Published 2017-12-27
URL http://arxiv.org/abs/1712.09709v2
PDF http://arxiv.org/pdf/1712.09709v2.pdf
PWC https://paperswithcode.com/paper/report-dynamic-eye-movement-matching-and
Repo
Framework

Visual Question Generation as Dual Task of Visual Question Answering

Title Visual Question Generation as Dual Task of Visual Question Answering
Authors Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang
Abstract Recently visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, which have been explored separately. In this work, we propose an end-to-end unified framework, the Invertible Question Answering Network (iQAN), to leverage the complementary relations between questions and answers in images by jointly training the model on VQA and VQG tasks. Corresponding parameter sharing scheme and regular terms are proposed as constraints to explicitly leverage Q,A’s dependencies to guide the training process. After training, iQAN can take either question or answer as input, then output the counterpart. Evaluated on the large-scale visual question answering datasets CLEVR and VQA2, our iQAN improves the VQA accuracy over the baselines. We also show the dual learning framework of iQAN can be generalized to other VQA architectures and consistently improve the results over both the VQA and VQG tasks.
Tasks Question Answering, Question Generation, Visual Question Answering
Published 2017-09-21
URL http://arxiv.org/abs/1709.07192v1
PDF http://arxiv.org/pdf/1709.07192v1.pdf
PWC https://paperswithcode.com/paper/visual-question-generation-as-dual-task-of
Repo
Framework

Evidence for the size principle in semantic and perceptual domains

Title Evidence for the size principle in semantic and perceptual domains
Authors Joshua C. Peterson, Thomas L. Griffiths
Abstract Shepard’s Universal Law of Generalization offered a compelling case for the first physics-like law in cognitive science that should hold for all intelligent agents in the universe. Shepard’s account is based on a rational Bayesian model of generalization, providing an answer to the question of why such a law should emerge. Extending this account to explain how humans use multiple examples to make better generalizations requires an additional assumption, called the size principle: hypotheses that pick out fewer objects should make a larger contribution to generalization. The degree to which this principle warrants similarly law-like status is far from conclusive. Typically, evaluating this principle has not been straightforward, requiring additional assumptions. We present a new method for evaluating the size principle that is more direct, and apply this method to a diverse array of datasets. Our results provide support for the broad applicability of the size principle.
Tasks
Published 2017-05-09
URL http://arxiv.org/abs/1705.03260v1
PDF http://arxiv.org/pdf/1705.03260v1.pdf
PWC https://paperswithcode.com/paper/evidence-for-the-size-principle-in-semantic
Repo
Framework

PANFIS++: A Generalized Approach to Evolving Learning

Title PANFIS++: A Generalized Approach to Evolving Learning
Authors Mahardhika Pratama
Abstract The concept of evolving intelligent system (EIS) provides an effective avenue for data stream mining because it is capable of coping with two prominent issues: online learning and rapidly changing environments. We note at least three uncharted territories of existing EISs: data uncertainty, temporal system dynamic, redundant data streams. This book chapter aims at delivering a concrete solution of this problem with the algorithmic development of a novel learning algorithm, namely PANFIS++. PANFIS++ is a generalized version of the PANFIS by putting forward three important components: 1) An online active learning scenario is developed to overcome redundant data streams. This module allows to actively select data streams for the training process, thereby expediting execution time and enhancing generalization performance, 2) PANFIS++ is built upon an interval type-2 fuzzy system environment, which incorporates the so-called footprint of uncertainty. This component provides a degree of tolerance for data uncertainty. 3) PANFIS++ is structured under a recurrent network architecture with a self-feedback loop. This is meant to tackle the temporal system dynamic. The efficacy of the PANFIS++ has been numerically validated through numerous real-world and synthetic case studies, where it delivers the highest predictive accuracy while retaining the lowest complexity.
Tasks Active Learning
Published 2017-05-06
URL http://arxiv.org/abs/1705.02476v1
PDF http://arxiv.org/pdf/1705.02476v1.pdf
PWC https://paperswithcode.com/paper/panfis-a-generalized-approach-to-evolving
Repo
Framework

Pronunciation recognition of English phonemes /\textipa{@}/, /æ/, /\textipa{A}:/ and /\textipa{2}/ using Formants and Mel Frequency Cepstral Coefficients

Title Pronunciation recognition of English phonemes /\textipa{@}/, /æ/, /\textipa{A}:/ and /\textipa{2}/ using Formants and Mel Frequency Cepstral Coefficients
Authors Keith Y. Patarroyo, Vladimir Vargas-Calderón
Abstract The Vocal Joystick Vowel Corpus, by Washington University, was used to study monophthongs pronounced by native English speakers. The objective of this study was to quantitatively measure the extent at which speech recognition methods can distinguish between similar sounding vowels. In particular, the phonemes /\textipa{@}/, /{\ae}/, /\textipa{A}:/ and /\textipa{2}/ were analysed. 748 sound files from the corpus were used and subjected to Linear Predictive Coding (LPC) to compute their formants, and to Mel Frequency Cepstral Coefficients (MFCC) algorithm, to compute the cepstral coefficients. A Decision Tree Classifier was used to build a predictive model that learnt the patterns of the two first formants measured in the data set, as well as the patterns of the 13 cepstral coefficients. An accuracy of 70% was achieved using formants for the mentioned phonemes. For the MFCC analysis an accuracy of 52 % was achieved and an accuracy of 71% when /\textipa{@}/ was ignored. The results obtained show that the studied algorithms are far from mimicking the ability of distinguishing subtle differences in sounds like human hearing does.
Tasks Speech Recognition
Published 2017-02-23
URL http://arxiv.org/abs/1702.07071v1
PDF http://arxiv.org/pdf/1702.07071v1.pdf
PWC https://paperswithcode.com/paper/pronunciation-recognition-of-english-phonemes
Repo
Framework

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet

Title Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet
Authors Amarjot Singh, Nick Kingsbury
Abstract We propose a DTCWT ScatterNet Convolutional Neural Network (DTSCNN) formed by replacing the first few layers of a CNN network with a parametric log based DTCWT ScatterNet. The ScatterNet extracts edge based invariant representations that are used by the later layers of the CNN to learn high-level features. This improves the training of the network as the later layers can learn more complex patterns from the start of learning because the edge representations are already present. The efficient learning of the DTSCNN network is demonstrated on CIFAR-10 and Caltech-101 datasets. The generic nature of the ScatterNet front-end is shown by an equivalent performance to pre-trained CNN front-ends. A comparison with the state-of-the-art on CIFAR-10 and Caltech-101 datasets is also presented.
Tasks
Published 2017-08-30
URL http://arxiv.org/abs/1708.09259v1
PDF http://arxiv.org/pdf/1708.09259v1.pdf
PWC https://paperswithcode.com/paper/efficient-convolutional-network-learning
Repo
Framework

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

Title Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations
Authors Boris Hanin
Abstract This article concerns the expressive power of depth in neural nets with ReLU activations and bounded width. We are particularly interested in the following questions: what is the minimal width $w_{\text{min}}(d)$ so that ReLU nets of width $w_{\text{min}}(d)$ (and arbitrary depth) can approximate any continuous function on the unit cube $[0,1]^d$ aribitrarily well? For ReLU nets near this minimal width, what can one say about the depth necessary to approximate a given function? Our approach to this paper is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well-suited for representing convex functions. In particular, we prove that ReLU nets with width $d+1$ can approximate any continuous convex function of $d$ variables arbitrarily well. These results then give quantitative depth estimates for the rate of approximation of any continuous scalar function on the $d$-dimensional cube $[0,1]^d$ by ReLU nets with width $d+3.$
Tasks
Published 2017-08-09
URL http://arxiv.org/abs/1708.02691v3
PDF http://arxiv.org/pdf/1708.02691v3.pdf
PWC https://paperswithcode.com/paper/universal-function-approximation-by-deep
Repo
Framework

End-to-End Video Classification with Knowledge Graphs

Title End-to-End Video Classification with Knowledge Graphs
Authors Fang Yuan, Zhe Wang, Jie Lin, Luis Fernando D’Haro, Kim Jung Jae, Zeng Zeng, Vijay Chandrasekhar
Abstract Video understanding has attracted much research attention especially since the recent availability of large-scale video benchmarks. In this paper, we address the problem of multi-label video classification. We first observe that there exists a significant knowledge gap between how machines and humans learn. That is, while current machine learning approaches including deep neural networks largely focus on the representations of the given data, humans often look beyond the data at hand and leverage external knowledge to make better decisions. Towards narrowing the gap, we propose to incorporate external knowledge graphs into video classification. In particular, we unify traditional “knowledgeless” machine learning models and knowledge graphs in a novel end-to-end framework. The framework is flexible to work with most existing video classification algorithms including state-of-the-art deep models. Finally, we conduct extensive experiments on the largest public video dataset YouTube-8M. The results are promising across the board, improving mean average precision by up to 2.9%.
Tasks Knowledge Graphs, Video Classification, Video Understanding
Published 2017-11-06
URL http://arxiv.org/abs/1711.01714v1
PDF http://arxiv.org/pdf/1711.01714v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-video-classification-with
Repo
Framework

PDE approach to the problem of online prediction with expert advice: a construction of potential-based strategies

Title PDE approach to the problem of online prediction with expert advice: a construction of potential-based strategies
Authors Dmitry B. Rokhlin
Abstract We consider a sequence of repeated prediction games and formally pass to the limit. The supersolutions of the resulting non-linear parabolic partial differential equation are closely related to the potential functions in the sense of N.,Cesa-Bianci, G.,Lugosi (2003). Any such supersolution gives an upper bound for forecaster’s regret and suggests a potential-based prediction strategy, satisfying the Blackwell condition. A conventional upper bound for the worst-case regret is justified by a simple verification argument.
Tasks
Published 2017-05-02
URL http://arxiv.org/abs/1705.01091v1
PDF http://arxiv.org/pdf/1705.01091v1.pdf
PWC https://paperswithcode.com/paper/pde-approach-to-the-problem-of-online
Repo
Framework

Exact Dimensionality Selection for Bayesian PCA

Title Exact Dimensionality Selection for Bayesian PCA
Authors Charles Bouveyron, Pierre Latouche, Pierre-Alexandre Mattei
Abstract We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. We also propose a heuristic based on the expected shape of the marginal likelihood curve in order to choose the hyperparameters. In non-asymptotic frameworks, we show on simulated data that this exact dimensionality selection approach is competitive with both Bayesian and frequentist state-of-the-art methods.
Tasks Model Selection
Published 2017-03-08
URL https://arxiv.org/abs/1703.02834v2
PDF https://arxiv.org/pdf/1703.02834v2.pdf
PWC https://paperswithcode.com/paper/exact-dimensionality-selection-for-bayesian
Repo
Framework

Iterative Multi-document Neural Attention for Multiple Answer Prediction

Title Iterative Multi-document Neural Attention for Multiple Answer Prediction
Authors Claudio Greco, Alessandro Suglia, Pierpaolo Basile, Gaetano Rossiello, Giovanni Semeraro
Abstract People have information needs of varying complexity, which can be solved by an intelligent agent able to answer questions formulated in a proper way, eventually considering user context and preferences. In a scenario in which the user profile can be considered as a question, intelligent agents able to answer questions can be used to find the most relevant answers for a given user. In this work we propose a novel model based on Artificial Neural Networks to answer questions with multiple answers by exploiting multiple facts retrieved from a knowledge base. The model is evaluated on the factoid Question Answering and top-n recommendation tasks of the bAbI Movie Dialog dataset. After assessing the performance of the model on both tasks, we try to define the long-term goal of a conversational recommender system able to interact using natural language and to support users in their information seeking processes in a personalized way.
Tasks Question Answering, Recommendation Systems
Published 2017-02-08
URL http://arxiv.org/abs/1702.02367v1
PDF http://arxiv.org/pdf/1702.02367v1.pdf
PWC https://paperswithcode.com/paper/iterative-multi-document-neural-attention-for
Repo
Framework

A Multi-Scale CNN and Curriculum Learning Strategy for Mammogram Classification

Title A Multi-Scale CNN and Curriculum Learning Strategy for Mammogram Classification
Authors William Lotter, Greg Sorensen, David Cox
Abstract Screening mammography is an important front-line tool for the early detection of breast cancer, and some 39 million exams are conducted each year in the United States alone. Here, we describe a multi-scale convolutional neural network (CNN) trained with a curriculum learning strategy that achieves high levels of accuracy in classifying mammograms. Specifically, we first train CNN-based patch classifiers on segmentation masks of lesions in mammograms, and then use the learned features to initialize a scanning-based model that renders a decision on the whole image, trained end-to-end on outcome data. We demonstrate that our approach effectively handles the “needle in a haystack” nature of full-image mammogram classification, achieving 0.92 AUROC on the DDSM dataset.
Tasks
Published 2017-07-21
URL http://arxiv.org/abs/1707.06978v1
PDF http://arxiv.org/pdf/1707.06978v1.pdf
PWC https://paperswithcode.com/paper/a-multi-scale-cnn-and-curriculum-learning
Repo
Framework

Geometric calibration of Colour and Stereo Surface Imaging System of ESA’s Trace Gas Orbiter

Title Geometric calibration of Colour and Stereo Surface Imaging System of ESA’s Trace Gas Orbiter
Authors Stepan Tulyakov, Anton Ivanov, Nicolas Thomas, Victoria Roloff, Antoine Pommerol, Gabriele Cremonese, Thomas Weigel, Francois Fleuret
Abstract There are many geometric calibration methods for “standard” cameras. These methods, however, cannot be used for the calibration of telescopes with large focal lengths and complex off-axis optics. Moreover, specialized calibration methods for the telescopes are scarce in literature. We describe the calibration method that we developed for the Colour and Stereo Surface Imaging System (CaSSIS) telescope, on board of the ExoMars Trace Gas Orbiter (TGO). Although our method is described in the context of CaSSIS, with camera-specific experiments, it is general and can be applied to other telescopes. We further encourage re-use of the proposed method by making our calibration code and data available on-line.
Tasks Calibration
Published 2017-07-03
URL http://arxiv.org/abs/1707.00606v1
PDF http://arxiv.org/pdf/1707.00606v1.pdf
PWC https://paperswithcode.com/paper/geometric-calibration-of-colour-and-stereo
Repo
Framework

Co-training for Demographic Classification Using Deep Learning from Label Proportions

Title Co-training for Demographic Classification Using Deep Learning from Label Proportions
Authors Ehsan Mohammady Ardehaly, Aron Culotta
Abstract Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) setting, in which the training data consist of bags of unlabeled instances with associated label distributions for each bag. We introduce a new regularization layer, Batch Averager, that can be appended to the last layer of any deep neural network to convert it from supervised learning to LLP. This layer can be implemented readily with existing deep learning packages. To further support domains in which the data consist of two conditionally independent feature views (e.g. image and text), we propose a co-training algorithm that iteratively generates pseudo bags and refits the deep LLP model to improve classification accuracy. We demonstrate our models on demographic attribute classification (gender and race/ethnicity), which has many applications in social media analysis, public health, and marketing. We conduct experiments to predict demographics of Twitter users based on their tweets and profile image, without requiring any user-level annotations for training. We find that the deep LLP approach outperforms baselines for both text and image features separately. Additionally, we find that co-training algorithm improves image and text classification by 4% and 8% absolute F1, respectively. Finally, an ensemble of text and image classifiers further improves the absolute F1 measure by 4% on average.
Tasks Text Classification
Published 2017-09-13
URL http://arxiv.org/abs/1709.04108v1
PDF http://arxiv.org/pdf/1709.04108v1.pdf
PWC https://paperswithcode.com/paper/co-training-for-demographic-classification
Repo
Framework
comments powered by Disqus