July 27, 2019

2691 words 13 mins read

Paper Group ANR 480

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks. Deep LSTM for Large Vocabulary Continuous Speech Recognition. Inverse Risk-Sensitive Reinforcement Learning. Which phoneme-to-viseme maps best improve visual-only computer lip-reading?. Automating Direct Speech Variations in Stories and Games. Potential Func …

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks


Title	Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
Authors	Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
Abstract	An important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning. In particular, the task of visual recognition is aligned to the task of visual question answering by forcing each to use the same word-region embeddings. We show this leads to greater inductive transfer from recognition to VQA than standard multitask learning. Visual recognition also improves, especially for categories that have relatively few recognition training labels but appear often in the VQA setting. Thus, our paper takes a small step towards creating more general vision systems by showing the benefit of interpretable, flexible, and trainable core representations.
Tasks	Multi-Task Learning, Question Answering, Visual Question Answering
Published	2017-04-02
URL	http://arxiv.org/abs/1704.00260v2
PDF	http://arxiv.org/pdf/1704.00260v2.pdf
PWC	https://paperswithcode.com/paper/aligned-image-word-representations-improve
Repo
Framework

Deep LSTM for Large Vocabulary Continuous Speech Recognition


Title	Deep LSTM for Large Vocabulary Continuous Speech Recognition
Authors	Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei, Peihao Wu, Wenchang Situ, Shuai Li, Yang Zhang
Abstract	Recurrent neural networks (RNNs), especially long short-term memory (LSTM) RNNs, are effective network for sequential task like speech recognition. Deeper LSTM models perform well on large vocabulary continuous speech recognition, because of their impressive learning ability. However, it is more difficult to train a deeper network. We introduce a training framework with layer-wise training and exponential moving average methods for deeper LSTM models. It is a competitive framework that LSTM models of more than 7 layers are successfully trained on Shenma voice search data in Mandarin and they outperform the deep LSTM models trained by conventional approach. Moreover, in order for online streaming speech recognition applications, the shallow model with low real time factor is distilled from the very deep model. The recognition accuracy have little loss in the distillation process. Therefore, the model trained with the proposed training framework reduces relative 14% character error rate, compared to original model which has the similar real-time capability. Furthermore, the novel transfer learning strategy with segmental Minimum Bayes-Risk is also introduced in the framework. The strategy makes it possible that training with only a small part of dataset could outperform full dataset training from the beginning.
Tasks	Large Vocabulary Continuous Speech Recognition, Speech Recognition, Transfer Learning
Published	2017-03-21
URL	http://arxiv.org/abs/1703.07090v1
PDF	http://arxiv.org/pdf/1703.07090v1.pdf
PWC	https://paperswithcode.com/paper/deep-lstm-for-large-vocabulary-continuous
Repo
Framework

Inverse Risk-Sensitive Reinforcement Learning


Title	Inverse Risk-Sensitive Reinforcement Learning
Authors	Lillian J. Ratliff, Eric Mazumdar
Abstract	We address the problem of inverse reinforcement learning in Markov decision processes where the agent is risk-sensitive. In particular, we model risk-sensitivity in a reinforcement learning framework by making use of models of human decision-making having their origins in behavioral psychology, behavioral economics, and neuroscience. We propose a gradient-based inverse reinforcement learning algorithm that minimizes a loss function defined on the observed behavior. We demonstrate the performance of the proposed technique on two examples, the first of which is the canonical Grid World example and the second of which is a Markov decision process modeling passengers’ decisions regarding ride-sharing. In the latter, we use pricing and travel time data from a ride-sharing company to construct the transition probabilities and rewards of the Markov decision process.
Tasks	Decision Making
Published	2017-03-29
URL	http://arxiv.org/abs/1703.09842v3
PDF	http://arxiv.org/pdf/1703.09842v3.pdf
PWC	https://paperswithcode.com/paper/inverse-risk-sensitive-reinforcement-learning
Repo
Framework

Which phoneme-to-viseme maps best improve visual-only computer lip-reading?


Title	Which phoneme-to-viseme maps best improve visual-only computer lip-reading?
Authors	Helen L. Bear, Richard W. Harvey, Barry-John Theobald, Yuxuan Lan
Abstract	A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings and consider if any are stable across talkers. We show a method for devising maps based on phoneme confusions from an automated lip-reading system, and we present new mappings that show improvements for individual talkers.
Tasks	Speech Recognition, Visual Speech Recognition
Published	2017-10-03
URL	http://arxiv.org/abs/1710.01093v1
PDF	http://arxiv.org/pdf/1710.01093v1.pdf
PWC	https://paperswithcode.com/paper/which-phoneme-to-viseme-maps-best-improve
Repo
Framework

Automating Direct Speech Variations in Stories and Games


Title	Automating Direct Speech Variations in Stories and Games
Authors	Stephanie M. Lukin, James O. Ryan, Marilyn A. Walker
Abstract	Dialogue authoring in large games requires not only content creation but the subtlety of its delivery, which can vary from character to character. Manually authoring this dialogue can be tedious, time-consuming, or even altogether infeasible. This paper utilizes a rich narrative representation for modeling dialogue and an expressive natural language generation engine for realizing it, and expands upon a translation tool that bridges the two. We add functionality to the translator to allow direct speech to be modeled by the narrative representation, whereas the original translator supports only narratives told by a third person narrator. We show that we can perform character substitution in dialogues. We implement and evaluate a potential application to dialogue implementation: generating dialogue for games with big, dynamic, or procedurally-generated open worlds. We present a pilot study on human perceptions of the personalities of characters using direct speech, assuming unknown personality types at the time of authoring.
Tasks	Text Generation
Published	2017-08-30
URL	http://arxiv.org/abs/1708.09090v1
PDF	http://arxiv.org/pdf/1708.09090v1.pdf
PWC	https://paperswithcode.com/paper/automating-direct-speech-variations-in
Repo
Framework

Potential Functions based Sampling Heuristic For Optimal Path Planning


Title	Potential Functions based Sampling Heuristic For Optimal Path Planning
Authors	Ahmed Hussain Qureshi, Yasar Ayaz
Abstract	Rapidly-exploring Random Tree Star(RRT) is a recently proposed extension of Rapidly-exploring Random Tree (RRT) algorithm that provides a collision-free, asymptotically optimal path regardless of obstacle’s geometry in a given environment. However, one of the limitations in the RRT algorithm is slow convergence to optimal path solution. As a result, it consumes high memory as well as time due to a large number of iterations utilised in achieving optimal path solution. To overcome these limitations, we propose the Potential Function Based-RRT* (P-RRT) that incorporates the Artificial Potential Field Algorithm in RRT. The proposed algorithm allows a considerable decrease in the number of iterations and thus leads to more efficient memory utilization and an accelerated convergence rate. In order to illustrate the usefulness of the proposed algorithm in terms of space execution and convergence rate, this paper presents rigorous simulation based comparisons between the proposed techniques and RRT* under different environmental conditions. Moreover, both algorithms are also tested and compared under non-holonomic differential constraints.
Tasks
Published	2017-04-02
URL	http://arxiv.org/abs/1704.00264v1
PDF	http://arxiv.org/pdf/1704.00264v1.pdf
PWC	https://paperswithcode.com/paper/potential-functions-based-sampling-heuristic
Repo
Framework

Variational Grid Setting Network


Title	Variational Grid Setting Network
Authors	Yu-Neng Chuang, Zi-Yu Huang, Yen-Lung Tsai
Abstract	We propose a new neural network architecture for automatic generation of missing characters in a Chinese font set. We call the neural network architecture the Variational Grid Setting Network which is based on the variational autoencoder (VAE) with some tweaks. The neural network model is able to generate missing characters relatively large in size ($256 \times 256$ pixels). Moreover, we show that one can use very few samples for training data set, and get a satisfied result.
Tasks
Published	2017-09-30
URL	http://arxiv.org/abs/1710.01255v3
PDF	http://arxiv.org/pdf/1710.01255v3.pdf
PWC	https://paperswithcode.com/paper/variational-grid-setting-network
Repo
Framework

Privacy-Preserving Personal Model Training


Title	Privacy-Preserving Personal Model Training
Authors	Sandra Servia-Rodriguez, Liang Wang, Jianxin R. Zhao, Richard Mortier, Hamed Haddadi
Abstract	Many current Internet services rely on inferences from models trained on user data. Commonly, both the training and inference tasks are carried out using cloud resources fed by personal data collected at scale from users. Holding and using such large collections of personal data in the cloud creates privacy risks to the data subjects, but is currently required for users to benefit from such services. We explore how to provide for model training and inference in a system where computation is pushed to the data in preference to moving data to the cloud, obviating many current privacy risks. Specifically, we take an initial model learnt from a small set of users and retrain it locally using data from a single user. We evaluate on two tasks: one supervised learning task, using a neural network to recognise users’ current activity from accelerometer traces; and one unsupervised learning task, identifying topics in a large set of documents. In both cases the accuracy is improved. We also analyse the robustness of our approach against adversarial attacks, as well as its feasibility by presenting a performance evaluation on a representative resource-constrained device (a Raspberry Pi).
Tasks
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00380v3
PDF	http://arxiv.org/pdf/1703.00380v3.pdf
PWC	https://paperswithcode.com/paper/privacy-preserving-personal-model-training
Repo
Framework

General Backpropagation Algorithm for Training Second-order Neural Networks


Title	General Backpropagation Algorithm for Training Second-order Neural Networks
Authors	Fenglei Fan, Wenxiang Cong, Ge Wang
Abstract	The artificial neural network is a popular framework in machine learning. To empower individual neurons, we recently suggested that the current type of neurons could be upgraded to 2nd order counterparts, in which the linear operation between inputs to a neuron and the associated weights is replaced with a nonlinear quadratic operation. A single 2nd order neurons already has a strong nonlinear modeling ability, such as implementing basic fuzzy logic operations. In this paper, we develop a general backpropagation (BP) algorithm to train the network consisting of 2nd-order neurons. The numerical studies are performed to verify of the generalized BP algorithm.
Tasks
Published	2017-08-17
URL	http://arxiv.org/abs/1708.06243v1
PDF	http://arxiv.org/pdf/1708.06243v1.pdf
PWC	https://paperswithcode.com/paper/general-backpropagation-algorithm-for
Repo
Framework

Local Word Vectors Guiding Keyphrase Extraction


Title	Local Word Vectors Guiding Keyphrase Extraction
Authors	Eirini Papagiannopoulou, Grigorios Tsoumakas
Abstract	Automated keyphrase extraction is a fundamental textual information processing task concerned with the selection of representative phrases from a document that summarize its content. This work presents a novel unsupervised method for keyphrase extraction, whose main innovation is the use of local word embeddings (in particular GloVe vectors), i.e., embeddings trained from the single document under consideration. We argue that such local representation of words and keyphrases are able to accurately capture their semantics in the context of the document they are part of, and therefore can help in improving keyphrase extraction quality. Empirical results offer evidence that indeed local representations lead to better keyphrase extraction results compared to both embeddings trained on very large third corpora or larger corpora consisting of several documents of the same scientific field and to other state-of-the-art unsupervised keyphrase extraction methods.
Tasks	Word Embeddings
Published	2017-10-20
URL	http://arxiv.org/abs/1710.07503v4
PDF	http://arxiv.org/pdf/1710.07503v4.pdf
PWC	https://paperswithcode.com/paper/local-word-vectors-guiding-keyphrase
Repo
Framework

Gaussian Three-Dimensional kernel SVM for Edge Detection Applications


Title	Gaussian Three-Dimensional kernel SVM for Edge Detection Applications
Authors	Safar Irandoust-Pakchin, Aydin Ayanzadeh, Siamak Beikzadeh
Abstract	This paper presents a novel and uniform algorithm for edge detection based on SVM (support vector machine) with Three-dimensional Gaussian radial basis function with kernel. Because of disadvantages in traditional edge detection such as inaccurate edge location, rough edge and careless on detect soft edge. The experimental results indicate how the SVM can detect edge in efficient way. The performance of the proposed algorithm is compared with existing methods, including Sobel and canny detectors. The results show that this method is better than classical algorithm such as canny and Sobel detector.
Tasks	Edge Detection
Published	2017-09-30
URL	http://arxiv.org/abs/1710.01260v1
PDF	http://arxiv.org/pdf/1710.01260v1.pdf
PWC	https://paperswithcode.com/paper/gaussian-three-dimensional-kernel-svm-for
Repo
Framework

Linear centralization classifier


Title	Linear centralization classifier
Authors	Mohammad Reza Bonyadi, Viktor Vegh, David C. Reutens
Abstract	A classification algorithm, called the Linear Centralization Classifier (LCC), is introduced. The algorithm seeks to find a transformation that best maps instances from the feature space to a space where they concentrate towards the center of their own classes, while maximimizing the distance between class centers. We formulate the classifier as a quadratic program with quadratic constraints. We then simplify this formulation to a linear program that can be solved effectively using a linear programming solver (e.g., simplex-dual). We extend the formulation for LCC to enable the use of kernel functions for non-linear classification applications. We compare our method with two standard classification methods (support vector machine and linear discriminant analysis) and four state-of-the-art classification methods when they are applied to eight standard classification datasets. Our experimental results show that LCC is able to classify instances more accurately (based on the area under the receiver operating characteristic) in comparison to other tested methods on the chosen datasets. We also report the results for LCC with a particular kernel to solve for synthetic non-linear classification problems.
Tasks
Published	2017-12-22
URL	http://arxiv.org/abs/1712.08259v1
PDF	http://arxiv.org/pdf/1712.08259v1.pdf
PWC	https://paperswithcode.com/paper/linear-centralization-classifier
Repo
Framework

A Deep Network with Visual Text Composition Behavior


Title	A Deep Network with Visual Text Composition Behavior
Authors	Hongyu Guo
Abstract	While natural languages are compositional, how state-of-the-art neural models achieve compositionality is still unclear. We propose a deep network, which not only achieves competitive accuracy for text classification, but also exhibits compositional behavior. That is, while creating hierarchical representations of a piece of text, such as a sentence, the lower layers of the network distribute their layer-specific attention weights to individual words. In contrast, the higher layers compose meaningful phrases and clauses, whose lengths increase as the networks get deeper until fully composing the sentence.
Tasks	Text Classification
Published	2017-07-05
URL	http://arxiv.org/abs/1707.01555v1
PDF	http://arxiv.org/pdf/1707.01555v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-network-with-visual-text-composition
Repo
Framework

FML-based Prediction Agent and Its Application to Game of Go


Title	FML-based Prediction Agent and Its Application to Game of Go
Authors	Chang-Shing Lee, Mei-Hui Wang, Chia-Hsiu Kao, Sheng-Chi Yang, Yusuke Nojima, Ryosuke Saga, Nan Shuo, Naoyuki Kubota
Abstract	In this paper, we present a robotic prediction agent including a darkforest Go engine, a fuzzy markup language (FML) assessment engine, an FML-based decision support engine, and a robot engine for game of Go application. The knowledge base and rule base of FML assessment engine are constructed by referring the information from the darkforest Go engine located in NUTN and OPU, for example, the number of MCTS simulations and winning rate prediction. The proposed robotic prediction agent first retrieves the database of Go competition website, and then the FML assessment engine infers the winning possibility based on the information generated by darkforest Go engine. The FML-based decision support engine computes the winning possibility based on the partial game situation inferred by FML assessment engine. Finally, the robot engine combines with the human-friendly robot partner PALRO, produced by Fujisoft incorporated, to report the game situation to human Go players. Experimental results show that the FML-based prediction agent can work effectively.
Tasks	Game of Go
Published	2017-04-16
URL	http://arxiv.org/abs/1704.04719v1
PDF	http://arxiv.org/pdf/1704.04719v1.pdf
PWC	https://paperswithcode.com/paper/fml-based-prediction-agent-and-its
Repo
Framework

Deep Neural Generative Model of Functional MRI Images for Psychiatric Disorder Diagnosis


Title	Deep Neural Generative Model of Functional MRI Images for Psychiatric Disorder Diagnosis
Authors	Takashi Matsubara, Tetsuo Tashiro, Kuniaki Uehara
Abstract	Accurate diagnosis of psychiatric disorders plays a critical role in improving the quality of life for patients and potentially supports the development of new treatments. Many studies have been conducted on machine learning techniques that seek brain imaging data for specific biomarkers of disorders. These studies have encountered the following dilemma: A direct classification overfits to a small number of high-dimensional samples but unsupervised feature-extraction has the risk of extracting a signal of no interest. In addition, such studies often provided only diagnoses for patients without presenting the reasons for these diagnoses. This study proposed a deep neural generative model of resting-state functional magnetic resonance imaging (fMRI) data. The proposed model is conditioned by the assumption of the subject’s state and estimates the posterior probability of the subject’s state given the imaging data, using Bayes’ rule. This study applied the proposed model to diagnose schizophrenia and bipolar disorders. Diagnostic accuracy was improved by a large margin over competitive approaches, namely classifications of functional connectivity, discriminative/generative models of region-wise signals, and those with unsupervised feature-extractors. The proposed model visualizes brain regions largely related to the disorders, thus motivating further biological investigation.
Tasks
Published	2017-12-18
URL	http://arxiv.org/abs/1712.06260v2
PDF	http://arxiv.org/pdf/1712.06260v2.pdf
PWC	https://paperswithcode.com/paper/deep-neural-generative-model-of-functional
Repo
Framework