July 29, 2019

2872 words 14 mins read

Paper Group ANR 127

Analogical-based Bayesian Optimization. Report: Dynamic Eye Movement Matching and Visualization Tool in Neuro Gesture. Visual Question Generation as Dual Task of Visual Question Answering. Evidence for the size principle in semantic and perceptual domains. PANFIS++: A Generalized Approach to Evolving Learning. Pronunciation recognition of English p …

Analogical-based Bayesian Optimization


Title	Analogical-based Bayesian Optimization
Authors	Trung Le, Khanh Nguyen, Tu Dinh Nguyen, Dinh Phung
Abstract	Some real-world problems revolve to solve the optimization problem \max_{x\in\mathcal{X}}f\left(x\right) where f\left(.\right) is a black-box function and X might be the set of non-vectorial objects (e.g., distributions) where we can only define a symmetric and non-negative similarity score on it. This setting requires a novel view for the standard framework of Bayesian Optimization that generalizes the core insightful spirit of this framework. With this spirit, in this paper, we propose Analogical-based Bayesian Optimization that can maximize black-box function over a domain where only a similarity score can be defined. Our pathway is as follows: we first base on the geometric view of Gaussian Processes (GP) to define the concept of influence level that allows us to analytically represent predictive means and variances of GP posteriors and base on that view to enable replacing kernel similarity by a more genetic similarity score. Furthermore, we also propose two strategies to find a batch of query points that can efficiently handle high dimensional data.
Tasks	Gaussian Processes
Published	2017-09-19
URL	http://arxiv.org/abs/1709.06390v1
PDF	http://arxiv.org/pdf/1709.06390v1.pdf
PWC	https://paperswithcode.com/paper/analogical-based-bayesian-optimization
Repo
Framework

Report: Dynamic Eye Movement Matching and Visualization Tool in Neuro Gesture


Title	Report: Dynamic Eye Movement Matching and Visualization Tool in Neuro Gesture
Authors	Qiangeng Xu, John Kender
Abstract	In the research of the impact of gestures using by a lecturer, one challenging task is to infer the attention of a group of audiences. Two important measurements that can help infer the level of attention are eye movement data and Electroencephalography (EEG) data. Under the fundamental assumption that a group of people would look at the same place if they all pay attention at the same time, we apply a method, “Time Warp Edit Distance”, to calculate the similarity of their eye movement trajectories. Moreover, we also cluster eye movement pattern of audiences based on these pair-wised similarity metrics. Besides, since we don’t have a direct metric for the “attention” ground truth, a visual assessment would be beneficial to evaluate the gesture-attention relationship. Thus we also implement a visualization tool.
Tasks	EEG
Published	2017-12-27
URL	http://arxiv.org/abs/1712.09709v2
PDF	http://arxiv.org/pdf/1712.09709v2.pdf
PWC	https://paperswithcode.com/paper/report-dynamic-eye-movement-matching-and
Repo
Framework

Visual Question Generation as Dual Task of Visual Question Answering


Title	Visual Question Generation as Dual Task of Visual Question Answering
Authors	Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang
Abstract	Recently visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, which have been explored separately. In this work, we propose an end-to-end unified framework, the Invertible Question Answering Network (iQAN), to leverage the complementary relations between questions and answers in images by jointly training the model on VQA and VQG tasks. Corresponding parameter sharing scheme and regular terms are proposed as constraints to explicitly leverage Q,A’s dependencies to guide the training process. After training, iQAN can take either question or answer as input, then output the counterpart. Evaluated on the large-scale visual question answering datasets CLEVR and VQA2, our iQAN improves the VQA accuracy over the baselines. We also show the dual learning framework of iQAN can be generalized to other VQA architectures and consistently improve the results over both the VQA and VQG tasks.
Tasks	Question Answering, Question Generation, Visual Question Answering
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07192v1
PDF	http://arxiv.org/pdf/1709.07192v1.pdf
PWC	https://paperswithcode.com/paper/visual-question-generation-as-dual-task-of
Repo
Framework

Evidence for the size principle in semantic and perceptual domains


Title	Evidence for the size principle in semantic and perceptual domains
Authors	Joshua C. Peterson, Thomas L. Griffiths
Abstract	Shepard’s Universal Law of Generalization offered a compelling case for the first physics-like law in cognitive science that should hold for all intelligent agents in the universe. Shepard’s account is based on a rational Bayesian model of generalization, providing an answer to the question of why such a law should emerge. Extending this account to explain how humans use multiple examples to make better generalizations requires an additional assumption, called the size principle: hypotheses that pick out fewer objects should make a larger contribution to generalization. The degree to which this principle warrants similarly law-like status is far from conclusive. Typically, evaluating this principle has not been straightforward, requiring additional assumptions. We present a new method for evaluating the size principle that is more direct, and apply this method to a diverse array of datasets. Our results provide support for the broad applicability of the size principle.
Tasks
Published	2017-05-09
URL	http://arxiv.org/abs/1705.03260v1
PDF	http://arxiv.org/pdf/1705.03260v1.pdf
PWC	https://paperswithcode.com/paper/evidence-for-the-size-principle-in-semantic
Repo
Framework

PANFIS++: A Generalized Approach to Evolving Learning


Title	PANFIS++: A Generalized Approach to Evolving Learning
Authors	Mahardhika Pratama
Abstract	The concept of evolving intelligent system (EIS) provides an effective avenue for data stream mining because it is capable of coping with two prominent issues: online learning and rapidly changing environments. We note at least three uncharted territories of existing EISs: data uncertainty, temporal system dynamic, redundant data streams. This book chapter aims at delivering a concrete solution of this problem with the algorithmic development of a novel learning algorithm, namely PANFIS++. PANFIS++ is a generalized version of the PANFIS by putting forward three important components: 1) An online active learning scenario is developed to overcome redundant data streams. This module allows to actively select data streams for the training process, thereby expediting execution time and enhancing generalization performance, 2) PANFIS++ is built upon an interval type-2 fuzzy system environment, which incorporates the so-called footprint of uncertainty. This component provides a degree of tolerance for data uncertainty. 3) PANFIS++ is structured under a recurrent network architecture with a self-feedback loop. This is meant to tackle the temporal system dynamic. The efficacy of the PANFIS++ has been numerically validated through numerous real-world and synthetic case studies, where it delivers the highest predictive accuracy while retaining the lowest complexity.
Tasks	Active Learning
Published	2017-05-06
URL	http://arxiv.org/abs/1705.02476v1
PDF	http://arxiv.org/pdf/1705.02476v1.pdf
PWC	https://paperswithcode.com/paper/panfis-a-generalized-approach-to-evolving
Repo
Framework

Pronunciation recognition of English phonemes /\textipa{@}/, /æ/, /\textipa{A}:/ and /\textipa{2}/ using Formants and Mel Frequency Cepstral Coefficients


Title	Pronunciation recognition of English phonemes /\textipa{@}/, /æ/, /\textipa{A}:/ and /\textipa{2}/ using Formants and Mel Frequency Cepstral Coefficients
Authors	Keith Y. Patarroyo, Vladimir Vargas-Calderón
Abstract	The Vocal Joystick Vowel Corpus, by Washington University, was used to study monophthongs pronounced by native English speakers. The objective of this study was to quantitatively measure the extent at which speech recognition methods can distinguish between similar sounding vowels. In particular, the phonemes /\textipa{@}/, /{\ae}/, /\textipa{A}:/ and /\textipa{2}/ were analysed. 748 sound files from the corpus were used and subjected to Linear Predictive Coding (LPC) to compute their formants, and to Mel Frequency Cepstral Coefficients (MFCC) algorithm, to compute the cepstral coefficients. A Decision Tree Classifier was used to build a predictive model that learnt the patterns of the two first formants measured in the data set, as well as the patterns of the 13 cepstral coefficients. An accuracy of 70% was achieved using formants for the mentioned phonemes. For the MFCC analysis an accuracy of 52 % was achieved and an accuracy of 71% when /\textipa{@}/ was ignored. The results obtained show that the studied algorithms are far from mimicking the ability of distinguishing subtle differences in sounds like human hearing does.
Tasks	Speech Recognition
Published	2017-02-23
URL	http://arxiv.org/abs/1702.07071v1
PDF	http://arxiv.org/pdf/1702.07071v1.pdf
PWC	https://paperswithcode.com/paper/pronunciation-recognition-of-english-phonemes
Repo
Framework

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet


Title	Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet
Authors	Amarjot Singh, Nick Kingsbury
Abstract	We propose a DTCWT ScatterNet Convolutional Neural Network (DTSCNN) formed by replacing the first few layers of a CNN network with a parametric log based DTCWT ScatterNet. The ScatterNet extracts edge based invariant representations that are used by the later layers of the CNN to learn high-level features. This improves the training of the network as the later layers can learn more complex patterns from the start of learning because the edge representations are already present. The efficient learning of the DTSCNN network is demonstrated on CIFAR-10 and Caltech-101 datasets. The generic nature of the ScatterNet front-end is shown by an equivalent performance to pre-trained CNN front-ends. A comparison with the state-of-the-art on CIFAR-10 and Caltech-101 datasets is also presented.
Tasks
Published	2017-08-30
URL	http://arxiv.org/abs/1708.09259v1
PDF	http://arxiv.org/pdf/1708.09259v1.pdf
PWC	https://paperswithcode.com/paper/efficient-convolutional-network-learning
Repo
Framework

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations


Title	Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations
Authors	Boris Hanin
Abstract	This article concerns the expressive power of depth in neural nets with ReLU activations and bounded width. We are particularly interested in the following questions: what is the minimal width $w_{\text{min}}(d)$ so that ReLU nets of width $w_{\text{min}}(d)$ (and arbitrary depth) can approximate any continuous function on the unit cube $[0,1]^d$ aribitrarily well? For ReLU nets near this minimal width, what can one say about the depth necessary to approximate a given function? Our approach to this paper is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well-suited for representing convex functions. In particular, we prove that ReLU nets with width $d+1$ can approximate any continuous convex function of $d$ variables arbitrarily well. These results then give quantitative depth estimates for the rate of approximation of any continuous scalar function on the $d$-dimensional cube $[0,1]^d$ by ReLU nets with width $d+3.$
Tasks
Published	2017-08-09
URL	http://arxiv.org/abs/1708.02691v3
PDF	http://arxiv.org/pdf/1708.02691v3.pdf
PWC	https://paperswithcode.com/paper/universal-function-approximation-by-deep
Repo
Framework

End-to-End Video Classification with Knowledge Graphs


Title	End-to-End Video Classification with Knowledge Graphs
Authors	Fang Yuan, Zhe Wang, Jie Lin, Luis Fernando D’Haro, Kim Jung Jae, Zeng Zeng, Vijay Chandrasekhar
Abstract	Video understanding has attracted much research attention especially since the recent availability of large-scale video benchmarks. In this paper, we address the problem of multi-label video classification. We first observe that there exists a significant knowledge gap between how machines and humans learn. That is, while current machine learning approaches including deep neural networks largely focus on the representations of the given data, humans often look beyond the data at hand and leverage external knowledge to make better decisions. Towards narrowing the gap, we propose to incorporate external knowledge graphs into video classification. In particular, we unify traditional “knowledgeless” machine learning models and knowledge graphs in a novel end-to-end framework. The framework is flexible to work with most existing video classification algorithms including state-of-the-art deep models. Finally, we conduct extensive experiments on the largest public video dataset YouTube-8M. The results are promising across the board, improving mean average precision by up to 2.9%.
Tasks	Knowledge Graphs, Video Classification, Video Understanding
Published	2017-11-06
URL	http://arxiv.org/abs/1711.01714v1
PDF	http://arxiv.org/pdf/1711.01714v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-video-classification-with
Repo
Framework

PDE approach to the problem of online prediction with expert advice: a construction of potential-based strategies


Title	PDE approach to the problem of online prediction with expert advice: a construction of potential-based strategies
Authors	Dmitry B. Rokhlin
Abstract	We consider a sequence of repeated prediction games and formally pass to the limit. The supersolutions of the resulting non-linear parabolic partial differential equation are closely related to the potential functions in the sense of N.,Cesa-Bianci, G.,Lugosi (2003). Any such supersolution gives an upper bound for forecaster’s regret and suggests a potential-based prediction strategy, satisfying the Blackwell condition. A conventional upper bound for the worst-case regret is justified by a simple verification argument.
Tasks
Published	2017-05-02
URL	http://arxiv.org/abs/1705.01091v1
PDF	http://arxiv.org/pdf/1705.01091v1.pdf
PWC	https://paperswithcode.com/paper/pde-approach-to-the-problem-of-online
Repo
Framework

Exact Dimensionality Selection for Bayesian PCA


Title	Exact Dimensionality Selection for Bayesian PCA
Authors	Charles Bouveyron, Pierre Latouche, Pierre-Alexandre Mattei
Abstract	We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. We also propose a heuristic based on the expected shape of the marginal likelihood curve in order to choose the hyperparameters. In non-asymptotic frameworks, we show on simulated data that this exact dimensionality selection approach is competitive with both Bayesian and frequentist state-of-the-art methods.
Tasks	Model Selection
Published	2017-03-08
URL	https://arxiv.org/abs/1703.02834v2
PDF	https://arxiv.org/pdf/1703.02834v2.pdf
PWC	https://paperswithcode.com/paper/exact-dimensionality-selection-for-bayesian
Repo
Framework

Iterative Multi-document Neural Attention for Multiple Answer Prediction


Title	Iterative Multi-document Neural Attention for Multiple Answer Prediction
Authors	Claudio Greco, Alessandro Suglia, Pierpaolo Basile, Gaetano Rossiello, Giovanni Semeraro
Abstract	People have information needs of varying complexity, which can be solved by an intelligent agent able to answer questions formulated in a proper way, eventually considering user context and preferences. In a scenario in which the user profile can be considered as a question, intelligent agents able to answer questions can be used to find the most relevant answers for a given user. In this work we propose a novel model based on Artificial Neural Networks to answer questions with multiple answers by exploiting multiple facts retrieved from a knowledge base. The model is evaluated on the factoid Question Answering and top-n recommendation tasks of the bAbI Movie Dialog dataset. After assessing the performance of the model on both tasks, we try to define the long-term goal of a conversational recommender system able to interact using natural language and to support users in their information seeking processes in a personalized way.
Tasks	Question Answering, Recommendation Systems
Published	2017-02-08
URL	http://arxiv.org/abs/1702.02367v1
PDF	http://arxiv.org/pdf/1702.02367v1.pdf
PWC	https://paperswithcode.com/paper/iterative-multi-document-neural-attention-for
Repo
Framework

A Multi-Scale CNN and Curriculum Learning Strategy for Mammogram Classification


Title	A Multi-Scale CNN and Curriculum Learning Strategy for Mammogram Classification
Authors	William Lotter, Greg Sorensen, David Cox
Abstract	Screening mammography is an important front-line tool for the early detection of breast cancer, and some 39 million exams are conducted each year in the United States alone. Here, we describe a multi-scale convolutional neural network (CNN) trained with a curriculum learning strategy that achieves high levels of accuracy in classifying mammograms. Specifically, we first train CNN-based patch classifiers on segmentation masks of lesions in mammograms, and then use the learned features to initialize a scanning-based model that renders a decision on the whole image, trained end-to-end on outcome data. We demonstrate that our approach effectively handles the “needle in a haystack” nature of full-image mammogram classification, achieving 0.92 AUROC on the DDSM dataset.
Tasks
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06978v1
PDF	http://arxiv.org/pdf/1707.06978v1.pdf
PWC	https://paperswithcode.com/paper/a-multi-scale-cnn-and-curriculum-learning
Repo
Framework

Geometric calibration of Colour and Stereo Surface Imaging System of ESA’s Trace Gas Orbiter


Title	Geometric calibration of Colour and Stereo Surface Imaging System of ESA’s Trace Gas Orbiter
Authors	Stepan Tulyakov, Anton Ivanov, Nicolas Thomas, Victoria Roloff, Antoine Pommerol, Gabriele Cremonese, Thomas Weigel, Francois Fleuret
Abstract	There are many geometric calibration methods for “standard” cameras. These methods, however, cannot be used for the calibration of telescopes with large focal lengths and complex off-axis optics. Moreover, specialized calibration methods for the telescopes are scarce in literature. We describe the calibration method that we developed for the Colour and Stereo Surface Imaging System (CaSSIS) telescope, on board of the ExoMars Trace Gas Orbiter (TGO). Although our method is described in the context of CaSSIS, with camera-specific experiments, it is general and can be applied to other telescopes. We further encourage re-use of the proposed method by making our calibration code and data available on-line.
Tasks	Calibration
Published	2017-07-03
URL	http://arxiv.org/abs/1707.00606v1
PDF	http://arxiv.org/pdf/1707.00606v1.pdf
PWC	https://paperswithcode.com/paper/geometric-calibration-of-colour-and-stereo
Repo
Framework

Co-training for Demographic Classification Using Deep Learning from Label Proportions


Title	Co-training for Demographic Classification Using Deep Learning from Label Proportions
Authors	Ehsan Mohammady Ardehaly, Aron Culotta
Abstract	Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) setting, in which the training data consist of bags of unlabeled instances with associated label distributions for each bag. We introduce a new regularization layer, Batch Averager, that can be appended to the last layer of any deep neural network to convert it from supervised learning to LLP. This layer can be implemented readily with existing deep learning packages. To further support domains in which the data consist of two conditionally independent feature views (e.g. image and text), we propose a co-training algorithm that iteratively generates pseudo bags and refits the deep LLP model to improve classification accuracy. We demonstrate our models on demographic attribute classification (gender and race/ethnicity), which has many applications in social media analysis, public health, and marketing. We conduct experiments to predict demographics of Twitter users based on their tweets and profile image, without requiring any user-level annotations for training. We find that the deep LLP approach outperforms baselines for both text and image features separately. Additionally, we find that co-training algorithm improves image and text classification by 4% and 8% absolute F1, respectively. Finally, an ensemble of text and image classifiers further improves the absolute F1 measure by 4% on average.
Tasks	Text Classification
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04108v1
PDF	http://arxiv.org/pdf/1709.04108v1.pdf
PWC	https://paperswithcode.com/paper/co-training-for-demographic-classification
Repo
Framework