Paper Group ANR 480
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks. Deep LSTM for Large Vocabulary Continuous Speech Recognition. Inverse Risk-Sensitive Reinforcement Learning. Which phoneme-to-viseme maps best improve visual-only computer lip-reading?. Automating Direct Speech Variations in Stories and Games. Potential Func …
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
Title | Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks |
Authors | Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem |
Abstract | An important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning. In particular, the task of visual recognition is aligned to the task of visual question answering by forcing each to use the same word-region embeddings. We show this leads to greater inductive transfer from recognition to VQA than standard multitask learning. Visual recognition also improves, especially for categories that have relatively few recognition training labels but appear often in the VQA setting. Thus, our paper takes a small step towards creating more general vision systems by showing the benefit of interpretable, flexible, and trainable core representations. |
Tasks | Multi-Task Learning, Question Answering, Visual Question Answering |
Published | 2017-04-02 |
URL | http://arxiv.org/abs/1704.00260v2 |
http://arxiv.org/pdf/1704.00260v2.pdf | |
PWC | https://paperswithcode.com/paper/aligned-image-word-representations-improve |
Repo | |
Framework | |
Deep LSTM for Large Vocabulary Continuous Speech Recognition
Title | Deep LSTM for Large Vocabulary Continuous Speech Recognition |
Authors | Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei, Peihao Wu, Wenchang Situ, Shuai Li, Yang Zhang |
Abstract | Recurrent neural networks (RNNs), especially long short-term memory (LSTM) RNNs, are effective network for sequential task like speech recognition. Deeper LSTM models perform well on large vocabulary continuous speech recognition, because of their impressive learning ability. However, it is more difficult to train a deeper network. We introduce a training framework with layer-wise training and exponential moving average methods for deeper LSTM models. It is a competitive framework that LSTM models of more than 7 layers are successfully trained on Shenma voice search data in Mandarin and they outperform the deep LSTM models trained by conventional approach. Moreover, in order for online streaming speech recognition applications, the shallow model with low real time factor is distilled from the very deep model. The recognition accuracy have little loss in the distillation process. Therefore, the model trained with the proposed training framework reduces relative 14% character error rate, compared to original model which has the similar real-time capability. Furthermore, the novel transfer learning strategy with segmental Minimum Bayes-Risk is also introduced in the framework. The strategy makes it possible that training with only a small part of dataset could outperform full dataset training from the beginning. |
Tasks | Large Vocabulary Continuous Speech Recognition, Speech Recognition, Transfer Learning |
Published | 2017-03-21 |
URL | http://arxiv.org/abs/1703.07090v1 |
http://arxiv.org/pdf/1703.07090v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-lstm-for-large-vocabulary-continuous |
Repo | |
Framework | |
Inverse Risk-Sensitive Reinforcement Learning
Title | Inverse Risk-Sensitive Reinforcement Learning |
Authors | Lillian J. Ratliff, Eric Mazumdar |
Abstract | We address the problem of inverse reinforcement learning in Markov decision processes where the agent is risk-sensitive. In particular, we model risk-sensitivity in a reinforcement learning framework by making use of models of human decision-making having their origins in behavioral psychology, behavioral economics, and neuroscience. We propose a gradient-based inverse reinforcement learning algorithm that minimizes a loss function defined on the observed behavior. We demonstrate the performance of the proposed technique on two examples, the first of which is the canonical Grid World example and the second of which is a Markov decision process modeling passengers’ decisions regarding ride-sharing. In the latter, we use pricing and travel time data from a ride-sharing company to construct the transition probabilities and rewards of the Markov decision process. |
Tasks | Decision Making |
Published | 2017-03-29 |
URL | http://arxiv.org/abs/1703.09842v3 |
http://arxiv.org/pdf/1703.09842v3.pdf | |
PWC | https://paperswithcode.com/paper/inverse-risk-sensitive-reinforcement-learning |
Repo | |
Framework | |
Which phoneme-to-viseme maps best improve visual-only computer lip-reading?
Title | Which phoneme-to-viseme maps best improve visual-only computer lip-reading? |
Authors | Helen L. Bear, Richard W. Harvey, Barry-John Theobald, Yuxuan Lan |
Abstract | A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings and consider if any are stable across talkers. We show a method for devising maps based on phoneme confusions from an automated lip-reading system, and we present new mappings that show improvements for individual talkers. |
Tasks | Speech Recognition, Visual Speech Recognition |
Published | 2017-10-03 |
URL | http://arxiv.org/abs/1710.01093v1 |
http://arxiv.org/pdf/1710.01093v1.pdf | |
PWC | https://paperswithcode.com/paper/which-phoneme-to-viseme-maps-best-improve |
Repo | |
Framework | |
Automating Direct Speech Variations in Stories and Games
Title | Automating Direct Speech Variations in Stories and Games |
Authors | Stephanie M. Lukin, James O. Ryan, Marilyn A. Walker |
Abstract | Dialogue authoring in large games requires not only content creation but the subtlety of its delivery, which can vary from character to character. Manually authoring this dialogue can be tedious, time-consuming, or even altogether infeasible. This paper utilizes a rich narrative representation for modeling dialogue and an expressive natural language generation engine for realizing it, and expands upon a translation tool that bridges the two. We add functionality to the translator to allow direct speech to be modeled by the narrative representation, whereas the original translator supports only narratives told by a third person narrator. We show that we can perform character substitution in dialogues. We implement and evaluate a potential application to dialogue implementation: generating dialogue for games with big, dynamic, or procedurally-generated open worlds. We present a pilot study on human perceptions of the personalities of characters using direct speech, assuming unknown personality types at the time of authoring. |
Tasks | Text Generation |
Published | 2017-08-30 |
URL | http://arxiv.org/abs/1708.09090v1 |
http://arxiv.org/pdf/1708.09090v1.pdf | |
PWC | https://paperswithcode.com/paper/automating-direct-speech-variations-in |
Repo | |
Framework | |
Potential Functions based Sampling Heuristic For Optimal Path Planning
Title | Potential Functions based Sampling Heuristic For Optimal Path Planning |
Authors | Ahmed Hussain Qureshi, Yasar Ayaz |
Abstract | Rapidly-exploring Random Tree Star(RRT*) is a recently proposed extension of Rapidly-exploring Random Tree (RRT) algorithm that provides a collision-free, asymptotically optimal path regardless of obstacle’s geometry in a given environment. However, one of the limitations in the RRT* algorithm is slow convergence to optimal path solution. As a result, it consumes high memory as well as time due to a large number of iterations utilised in achieving optimal path solution. To overcome these limitations, we propose the Potential Function Based-RRT* (P-RRT*) that incorporates the Artificial Potential Field Algorithm in RRT*. The proposed algorithm allows a considerable decrease in the number of iterations and thus leads to more efficient memory utilization and an accelerated convergence rate. In order to illustrate the usefulness of the proposed algorithm in terms of space execution and convergence rate, this paper presents rigorous simulation based comparisons between the proposed techniques and RRT* under different environmental conditions. Moreover, both algorithms are also tested and compared under non-holonomic differential constraints. |
Tasks | |
Published | 2017-04-02 |
URL | http://arxiv.org/abs/1704.00264v1 |
http://arxiv.org/pdf/1704.00264v1.pdf | |
PWC | https://paperswithcode.com/paper/potential-functions-based-sampling-heuristic |
Repo | |
Framework | |
Variational Grid Setting Network
Title | Variational Grid Setting Network |
Authors | Yu-Neng Chuang, Zi-Yu Huang, Yen-Lung Tsai |
Abstract | We propose a new neural network architecture for automatic generation of missing characters in a Chinese font set. We call the neural network architecture the Variational Grid Setting Network which is based on the variational autoencoder (VAE) with some tweaks. The neural network model is able to generate missing characters relatively large in size ($256 \times 256$ pixels). Moreover, we show that one can use very few samples for training data set, and get a satisfied result. |
Tasks | |
Published | 2017-09-30 |
URL | http://arxiv.org/abs/1710.01255v3 |
http://arxiv.org/pdf/1710.01255v3.pdf | |
PWC | https://paperswithcode.com/paper/variational-grid-setting-network |
Repo | |
Framework | |
Privacy-Preserving Personal Model Training
Title | Privacy-Preserving Personal Model Training |
Authors | Sandra Servia-Rodriguez, Liang Wang, Jianxin R. Zhao, Richard Mortier, Hamed Haddadi |
Abstract | Many current Internet services rely on inferences from models trained on user data. Commonly, both the training and inference tasks are carried out using cloud resources fed by personal data collected at scale from users. Holding and using such large collections of personal data in the cloud creates privacy risks to the data subjects, but is currently required for users to benefit from such services. We explore how to provide for model training and inference in a system where computation is pushed to the data in preference to moving data to the cloud, obviating many current privacy risks. Specifically, we take an initial model learnt from a small set of users and retrain it locally using data from a single user. We evaluate on two tasks: one supervised learning task, using a neural network to recognise users’ current activity from accelerometer traces; and one unsupervised learning task, identifying topics in a large set of documents. In both cases the accuracy is improved. We also analyse the robustness of our approach against adversarial attacks, as well as its feasibility by presenting a performance evaluation on a representative resource-constrained device (a Raspberry Pi). |
Tasks | |
Published | 2017-03-01 |
URL | http://arxiv.org/abs/1703.00380v3 |
http://arxiv.org/pdf/1703.00380v3.pdf | |
PWC | https://paperswithcode.com/paper/privacy-preserving-personal-model-training |
Repo | |
Framework | |
General Backpropagation Algorithm for Training Second-order Neural Networks
Title | General Backpropagation Algorithm for Training Second-order Neural Networks |
Authors | Fenglei Fan, Wenxiang Cong, Ge Wang |
Abstract | The artificial neural network is a popular framework in machine learning. To empower individual neurons, we recently suggested that the current type of neurons could be upgraded to 2nd order counterparts, in which the linear operation between inputs to a neuron and the associated weights is replaced with a nonlinear quadratic operation. A single 2nd order neurons already has a strong nonlinear modeling ability, such as implementing basic fuzzy logic operations. In this paper, we develop a general backpropagation (BP) algorithm to train the network consisting of 2nd-order neurons. The numerical studies are performed to verify of the generalized BP algorithm. |
Tasks | |
Published | 2017-08-17 |
URL | http://arxiv.org/abs/1708.06243v1 |
http://arxiv.org/pdf/1708.06243v1.pdf | |
PWC | https://paperswithcode.com/paper/general-backpropagation-algorithm-for |
Repo | |
Framework | |
Local Word Vectors Guiding Keyphrase Extraction
Title | Local Word Vectors Guiding Keyphrase Extraction |
Authors | Eirini Papagiannopoulou, Grigorios Tsoumakas |
Abstract | Automated keyphrase extraction is a fundamental textual information processing task concerned with the selection of representative phrases from a document that summarize its content. This work presents a novel unsupervised method for keyphrase extraction, whose main innovation is the use of local word embeddings (in particular GloVe vectors), i.e., embeddings trained from the single document under consideration. We argue that such local representation of words and keyphrases are able to accurately capture their semantics in the context of the document they are part of, and therefore can help in improving keyphrase extraction quality. Empirical results offer evidence that indeed local representations lead to better keyphrase extraction results compared to both embeddings trained on very large third corpora or larger corpora consisting of several documents of the same scientific field and to other state-of-the-art unsupervised keyphrase extraction methods. |
Tasks | Word Embeddings |
Published | 2017-10-20 |
URL | http://arxiv.org/abs/1710.07503v4 |
http://arxiv.org/pdf/1710.07503v4.pdf | |
PWC | https://paperswithcode.com/paper/local-word-vectors-guiding-keyphrase |
Repo | |
Framework | |
Gaussian Three-Dimensional kernel SVM for Edge Detection Applications
Title | Gaussian Three-Dimensional kernel SVM for Edge Detection Applications |
Authors | Safar Irandoust-Pakchin, Aydin Ayanzadeh, Siamak Beikzadeh |
Abstract | This paper presents a novel and uniform algorithm for edge detection based on SVM (support vector machine) with Three-dimensional Gaussian radial basis function with kernel. Because of disadvantages in traditional edge detection such as inaccurate edge location, rough edge and careless on detect soft edge. The experimental results indicate how the SVM can detect edge in efficient way. The performance of the proposed algorithm is compared with existing methods, including Sobel and canny detectors. The results show that this method is better than classical algorithm such as canny and Sobel detector. |
Tasks | Edge Detection |
Published | 2017-09-30 |
URL | http://arxiv.org/abs/1710.01260v1 |
http://arxiv.org/pdf/1710.01260v1.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-three-dimensional-kernel-svm-for |
Repo | |
Framework | |
Linear centralization classifier
Title | Linear centralization classifier |
Authors | Mohammad Reza Bonyadi, Viktor Vegh, David C. Reutens |
Abstract | A classification algorithm, called the Linear Centralization Classifier (LCC), is introduced. The algorithm seeks to find a transformation that best maps instances from the feature space to a space where they concentrate towards the center of their own classes, while maximimizing the distance between class centers. We formulate the classifier as a quadratic program with quadratic constraints. We then simplify this formulation to a linear program that can be solved effectively using a linear programming solver (e.g., simplex-dual). We extend the formulation for LCC to enable the use of kernel functions for non-linear classification applications. We compare our method with two standard classification methods (support vector machine and linear discriminant analysis) and four state-of-the-art classification methods when they are applied to eight standard classification datasets. Our experimental results show that LCC is able to classify instances more accurately (based on the area under the receiver operating characteristic) in comparison to other tested methods on the chosen datasets. We also report the results for LCC with a particular kernel to solve for synthetic non-linear classification problems. |
Tasks | |
Published | 2017-12-22 |
URL | http://arxiv.org/abs/1712.08259v1 |
http://arxiv.org/pdf/1712.08259v1.pdf | |
PWC | https://paperswithcode.com/paper/linear-centralization-classifier |
Repo | |
Framework | |
A Deep Network with Visual Text Composition Behavior
Title | A Deep Network with Visual Text Composition Behavior |
Authors | Hongyu Guo |
Abstract | While natural languages are compositional, how state-of-the-art neural models achieve compositionality is still unclear. We propose a deep network, which not only achieves competitive accuracy for text classification, but also exhibits compositional behavior. That is, while creating hierarchical representations of a piece of text, such as a sentence, the lower layers of the network distribute their layer-specific attention weights to individual words. In contrast, the higher layers compose meaningful phrases and clauses, whose lengths increase as the networks get deeper until fully composing the sentence. |
Tasks | Text Classification |
Published | 2017-07-05 |
URL | http://arxiv.org/abs/1707.01555v1 |
http://arxiv.org/pdf/1707.01555v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-network-with-visual-text-composition |
Repo | |
Framework | |
FML-based Prediction Agent and Its Application to Game of Go
Title | FML-based Prediction Agent and Its Application to Game of Go |
Authors | Chang-Shing Lee, Mei-Hui Wang, Chia-Hsiu Kao, Sheng-Chi Yang, Yusuke Nojima, Ryosuke Saga, Nan Shuo, Naoyuki Kubota |
Abstract | In this paper, we present a robotic prediction agent including a darkforest Go engine, a fuzzy markup language (FML) assessment engine, an FML-based decision support engine, and a robot engine for game of Go application. The knowledge base and rule base of FML assessment engine are constructed by referring the information from the darkforest Go engine located in NUTN and OPU, for example, the number of MCTS simulations and winning rate prediction. The proposed robotic prediction agent first retrieves the database of Go competition website, and then the FML assessment engine infers the winning possibility based on the information generated by darkforest Go engine. The FML-based decision support engine computes the winning possibility based on the partial game situation inferred by FML assessment engine. Finally, the robot engine combines with the human-friendly robot partner PALRO, produced by Fujisoft incorporated, to report the game situation to human Go players. Experimental results show that the FML-based prediction agent can work effectively. |
Tasks | Game of Go |
Published | 2017-04-16 |
URL | http://arxiv.org/abs/1704.04719v1 |
http://arxiv.org/pdf/1704.04719v1.pdf | |
PWC | https://paperswithcode.com/paper/fml-based-prediction-agent-and-its |
Repo | |
Framework | |
Deep Neural Generative Model of Functional MRI Images for Psychiatric Disorder Diagnosis
Title | Deep Neural Generative Model of Functional MRI Images for Psychiatric Disorder Diagnosis |
Authors | Takashi Matsubara, Tetsuo Tashiro, Kuniaki Uehara |
Abstract | Accurate diagnosis of psychiatric disorders plays a critical role in improving the quality of life for patients and potentially supports the development of new treatments. Many studies have been conducted on machine learning techniques that seek brain imaging data for specific biomarkers of disorders. These studies have encountered the following dilemma: A direct classification overfits to a small number of high-dimensional samples but unsupervised feature-extraction has the risk of extracting a signal of no interest. In addition, such studies often provided only diagnoses for patients without presenting the reasons for these diagnoses. This study proposed a deep neural generative model of resting-state functional magnetic resonance imaging (fMRI) data. The proposed model is conditioned by the assumption of the subject’s state and estimates the posterior probability of the subject’s state given the imaging data, using Bayes’ rule. This study applied the proposed model to diagnose schizophrenia and bipolar disorders. Diagnostic accuracy was improved by a large margin over competitive approaches, namely classifications of functional connectivity, discriminative/generative models of region-wise signals, and those with unsupervised feature-extractors. The proposed model visualizes brain regions largely related to the disorders, thus motivating further biological investigation. |
Tasks | |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06260v2 |
http://arxiv.org/pdf/1712.06260v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-generative-model-of-functional |
Repo | |
Framework | |