Paper Group ANR 427
Learning Abstract Options. Variational Inference for Data-Efficient Model Learning in POMDPs. Visual-Only Recognition of Normal, Whispered and Silent Speech. Diagnostic Accuracy of Content Based Dermatoscopic Image Retrieval with Deep Classification Features. Implicit Autoencoders. Fidelity-based Probabilistic Q-learning for Control of Quantum Syst …
Learning Abstract Options
Title | Learning Abstract Options |
Authors | Matthew Riemer, Miao Liu, Gerald Tesauro |
Abstract | Building systems that autonomously create temporal abstractions from data is a key challenge in scaling learning and planning in reinforcement learning. One popular approach for addressing this challenge is the options framework (Sutton et al., 1999). However, only recently in (Bacon et al., 2017) was a policy gradient theorem derived for online learning of general purpose options in an end to end fashion. In this work, we extend previous work on this topic that only focuses on learning a two-level hierarchy including options and primitive actions to enable learning simultaneously at multiple resolutions in time. We achieve this by considering an arbitrarily deep hierarchy of options where high level temporally extended options are composed of lower level options with finer resolutions in time. We extend results from (Bacon et al., 2017) and derive policy gradient theorems for a deep hierarchy of options. Our proposed hierarchical option-critic architecture is capable of learning internal policies, termination conditions, and hierarchical compositions over options without the need for any intrinsic rewards or subgoals. Our empirical results in both discrete and continuous environments demonstrate the efficiency of our framework. |
Tasks | |
Published | 2018-10-27 |
URL | https://arxiv.org/abs/1810.11583v4 |
https://arxiv.org/pdf/1810.11583v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-abstract-options |
Repo | |
Framework | |
Variational Inference for Data-Efficient Model Learning in POMDPs
Title | Variational Inference for Data-Efficient Model Learning in POMDPs |
Authors | Sebastian Tschiatschek, Kai Arulkumaran, Jan Stühmer, Katja Hofmann |
Abstract | Partially observable Markov decision processes (POMDPs) are a powerful abstraction for tasks that require decision making under uncertainty, and capture a wide range of real world tasks. Today, effective planning approaches exist that generate effective strategies given black-box models of a POMDP task. Yet, an open question is how to acquire accurate models for complex domains. In this paper we propose DELIP, an approach to model learning for POMDPs that utilizes amortized structured variational inference. We empirically show that our model leads to effective control strategies when coupled with state-of-the-art planners. Intuitively, model-based approaches should be particularly beneficial in environments with changing reward structures, or where rewards are initially unknown. Our experiments confirm that DELIP is particularly effective in this setting. |
Tasks | Decision Making, Decision Making Under Uncertainty |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.09281v1 |
http://arxiv.org/pdf/1805.09281v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-inference-for-data-efficient |
Repo | |
Framework | |
Visual-Only Recognition of Normal, Whispered and Silent Speech
Title | Visual-Only Recognition of Normal, Whispered and Silent Speech |
Authors | Stavros Petridis, Jie Shen, Doruk Cetin, Maja Pantic |
Abstract | Silent speech interfaces have been recently proposed as a way to enable communication when the acoustic signal is not available. This introduces the need to build visual speech recognition systems for silent and whispered speech. However, almost all the recently proposed systems have been trained on vocalised data only. This is in contrast with evidence in the literature which suggests that lip movements change depending on the speech mode. In this work, we introduce a new audiovisual database which is publicly available and contains normal, whispered and silent speech. To the best of our knowledge, this is the first study which investigates the differences between the three speech modes using the visual modality only. We show that an absolute decrease in classification rate of up to 3.7% is observed when training and testing on normal and whispered, respectively, and vice versa. An even higher decrease of up to 8.5% is reported when the models are tested on silent speech. This reveals that there are indeed visual differences between the 3 speech modes and the common assumption that vocalized training data can be used directly to train a silent speech recognition system may not be true. |
Tasks | Speech Recognition, Visual Speech Recognition |
Published | 2018-02-18 |
URL | http://arxiv.org/abs/1802.06399v1 |
http://arxiv.org/pdf/1802.06399v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-only-recognition-of-normal-whispered |
Repo | |
Framework | |
Diagnostic Accuracy of Content Based Dermatoscopic Image Retrieval with Deep Classification Features
Title | Diagnostic Accuracy of Content Based Dermatoscopic Image Retrieval with Deep Classification Features |
Authors | Philipp Tschandl, Giuseppe Argenziano, Majid Razmara, Jordan Yap |
Abstract | Background: Automated classification of medical images through neural networks can reach high accuracy rates but lack interpretability. Objectives: To compare the diagnostic accuracy obtained by using content based image retrieval (CBIR) to retrieve visually similar dermatoscopic images with corresponding disease labels against predictions made by a neural network. Methods: A neural network was trained to predict disease classes on dermatoscopic images from three retrospectively collected image datasets containing 888, 2750 and 16691 images respectively. Diagnosis predictions were made based on the most commonly occurring diagnosis in visually similar images, or based on the top-1 class prediction of the softmax output from the network. Outcome measures were area under the ROC curve for predicting a malignant lesion (AUC), multiclass-accuracy and mean average precision (mAP), measured on unseen test images of the corresponding dataset. Results: In all three datasets the skin cancer predictions from CBIR (evaluating the 16 most similar images) showed AUC values similar to softmax predictions (0.842, 0.806 and 0.852 versus 0.830, 0.810 and 0.847 respectively; p-value>0.99 for all). Similarly, the multiclass-accuracy of CBIR was comparable to softmax predictions. Networks trained for detecting only 3 classes performed better on a dataset with 8 classes when using CBIR as compared to softmax predictions (mAP 0.184 vs. 0.368 and 0.198 vs. 0.403 respectively). Conclusions: Presenting visually similar images based on features from a neural network shows comparable accuracy to the softmax probability-based diagnoses of convolutional neural networks. CBIR may be more helpful than a softmax classifier in improving diagnostic accuracy of clinicians in a routine clinical setting. |
Tasks | Content-Based Image Retrieval, Image Retrieval |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09487v1 |
http://arxiv.org/pdf/1810.09487v1.pdf | |
PWC | https://paperswithcode.com/paper/diagnostic-accuracy-of-content-based |
Repo | |
Framework | |
Implicit Autoencoders
Title | Implicit Autoencoders |
Authors | Alireza Makhzani |
Abstract | In this paper, we describe the “implicit autoencoder” (IAE), a generative autoencoder in which both the generative path and the recognition path are parametrized by implicit distributions. We use two generative adversarial networks to define the reconstruction and the regularization cost functions of the implicit autoencoder, and derive the learning rules based on maximum-likelihood learning. Using implicit distributions allows us to learn more expressive posterior and conditional likelihood distributions for the autoencoder. Learning an expressive conditional likelihood distribution enables the latent code to only capture the abstract and high-level information of the data, while the remaining low-level information is captured by the implicit conditional likelihood distribution. We show the applications of implicit autoencoders in disentangling content and style information, clustering, semi-supervised classification, learning expressive variational distributions, and multimodal image-to-image translation from unpaired data. |
Tasks | Image-to-Image Translation |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09804v2 |
http://arxiv.org/pdf/1805.09804v2.pdf | |
PWC | https://paperswithcode.com/paper/implicit-autoencoders |
Repo | |
Framework | |
Fidelity-based Probabilistic Q-learning for Control of Quantum Systems
Title | Fidelity-based Probabilistic Q-learning for Control of Quantum Systems |
Authors | Chunlin Chen, Daoyi Dong, Han-Xiong Li, Jian Chu, Tzyh-Jong Tarn |
Abstract | The balance between exploration and exploitation is a key problem for reinforcement learning methods, especially for Q-learning. In this paper, a fidelity-based probabilistic Q-learning (FPQL) approach is presented to naturally solve this problem and applied for learning control of quantum systems. In this approach, fidelity is adopted to help direct the learning process and the probability of each action to be selected at a certain state is updated iteratively along with the learning process, which leads to a natural exploration strategy instead of a pointed one with configured parameters. A probabilistic Q-learning (PQL) algorithm is first presented to demonstrate the basic idea of probabilistic action selection. Then the FPQL algorithm is presented for learning control of quantum systems. Two examples (a spin- 1/2 system and a lamda-type atomic system) are demonstrated to test the performance of the FPQL algorithm. The results show that FPQL algorithms attain a better balance between exploration and exploitation, and can also avoid local optimal policies and accelerate the learning process. |
Tasks | Q-Learning |
Published | 2018-06-08 |
URL | http://arxiv.org/abs/1806.03145v1 |
http://arxiv.org/pdf/1806.03145v1.pdf | |
PWC | https://paperswithcode.com/paper/fidelity-based-probabilistic-q-learning-for |
Repo | |
Framework | |
Analyzing deep CNN-based utterance embeddings for acoustic model adaptation
Title | Analyzing deep CNN-based utterance embeddings for acoustic model adaptation |
Authors | Joanna Rownicka, Peter Bell, Steve Renals |
Abstract | We explore why deep convolutional neural networks (CNNs) with small two-dimensional kernels, primarily used for modeling spatial relations in images, are also effective in speech recognition. We analyze the representations learned by deep CNNs and compare them with deep neural network (DNN) representations and i-vectors, in the context of acoustic model adaptation. To explore whether interpretable information can be decoded from the learned representations we evaluate their ability to discriminate between speakers, acoustic conditions, noise type, and gender using the Aurora-4 dataset. We extract both whole model embeddings (to capture the information learned across the whole network) and layer-specific embeddings which enable understanding of the flow of information across the network. We also use learned representations as the additional input for a time-delay neural network (TDNN) for the Aurora-4 and MGB-3 English datasets. We find that deep CNN embeddings outperform DNN embeddings for acoustic model adaptation and auxiliary features based on deep CNN embeddings result in similar word error rates to i-vectors. |
Tasks | Speech Recognition |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.04708v1 |
http://arxiv.org/pdf/1811.04708v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-deep-cnn-based-utterance-embeddings |
Repo | |
Framework | |
A^2Net: Adjacent Aggregation Networks for Image Raindrop Removal
Title | A^2Net: Adjacent Aggregation Networks for Image Raindrop Removal |
Authors | Huangxing Lin, Xueyang Fu, Changxing Jing, Xinghao Ding, Yue Huang |
Abstract | Existing methods for single images raindrop removal either have poor robustness or suffer from parameter burdens. In this paper, we propose a new Adjacent Aggregation Network (A^2Net) with lightweight architectures to remove raindrops from single images. Instead of directly cascading convolutional layers, we design an adjacent aggregation architecture to better fuse features for rich representations generation, which can lead to high quality images reconstruction. To further simplify the learning process, we utilize a problem-specific knowledge to force the network focus on the luminance channel in the YUV color space instead of all RGB channels. By combining adjacent aggregating operation with color space transformation, the proposed A^2Net can achieve state-of-the-art performances on raindrop removal with significant parameters reduction. |
Tasks | Rain Removal |
Published | 2018-11-24 |
URL | http://arxiv.org/abs/1811.09780v1 |
http://arxiv.org/pdf/1811.09780v1.pdf | |
PWC | https://paperswithcode.com/paper/a2net-adjacent-aggregation-networks-for-image |
Repo | |
Framework | |
Nesting Probabilistic Programs
Title | Nesting Probabilistic Programs |
Authors | Tom Rainforth |
Abstract | We formalize the notion of nesting probabilistic programming queries and investigate the resulting statistical implications. We demonstrate that while query nesting allows the definition of models which could not otherwise be expressed, such as those involving agents reasoning about other agents, existing systems take approaches which lead to inconsistent estimates. We show how to correct this by delineating possible ways one might want to nest queries and asserting the respective conditions required for convergence. We further introduce a new online nested Monte Carlo estimator that makes it substantially easier to ensure these conditions are met, thereby providing a simple framework for designing statistically correct inference engines. We prove the correctness of this online estimator and show that, when using the recommended setup, its asymptotic variance is always better than that of the equivalent fixed estimator, while its bias is always within a factor of two. |
Tasks | Probabilistic Programming |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.06328v2 |
http://arxiv.org/pdf/1803.06328v2.pdf | |
PWC | https://paperswithcode.com/paper/nesting-probabilistic-programs |
Repo | |
Framework | |
Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications
Title | Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications |
Authors | Baohua Sun, Lin Yang, Patrick Dong, Wenhan Zhang, Jason Dong, Charles Young |
Abstract | Computer vision performances have been significantly improved in recent years by Convolutional Neural Networks(CNN). Currently, applications using CNN algorithms are deployed mainly on general purpose hardwares, such as CPUs, GPUs or FPGAs. However, power consumption, speed, accuracy, memory footprint, and die size should all be taken into consideration for mobile and embedded applications. Domain Specific Architecture (DSA) for CNN is the efficient and practical solution for CNN deployment and implementation. We designed and produced a 28nm Two-Dimensional CNN-DSA accelerator with an ultra power-efficient performance of 9.3TOPS/Watt and with all processing done in the internal memory instead of outside DRAM. It classifies 224x224 RGB image inputs at more than 140fps with peak power consumption at less than 300mW and an accuracy comparable to the VGG benchmark. The CNN-DSA accelerator is reconfigurable to support CNN model coefficients of various layer sizes and layer types, including convolution, depth-wise convolution, short-cut connections, max pooling, and ReLU. Furthermore, in order to better support real-world deployment for various application scenarios, especially with low-end mobile and embedded platforms and MCUs (Microcontroller Units), we also designed algorithms to fully utilize the CNN-DSA accelerator efficiently by reducing the dependency on external accelerator computation resources, including implementation of Fully-Connected (FC) layers within the accelerator and compression of extracted features from the CNN-DSA accelerator. Live demos with our CNN-DSA accelerator on mobile and embedded systems show its capabilities to be widely and practically applied in the real world. |
Tasks | |
Published | 2018-04-30 |
URL | http://arxiv.org/abs/1805.00361v1 |
http://arxiv.org/pdf/1805.00361v1.pdf | |
PWC | https://paperswithcode.com/paper/ultra-power-efficient-cnn-domain-specific |
Repo | |
Framework | |
Persistent Hidden States and Nonlinear Transformation for Long Short-Term Memory
Title | Persistent Hidden States and Nonlinear Transformation for Long Short-Term Memory |
Authors | Heeyoul Choi |
Abstract | Recurrent neural networks (RNNs) have been drawing much attention with great success in many applications like speech recognition and neural machine translation. Long short-term memory (LSTM) is one of the most popular RNN units in deep learning applications. LSTM transforms the input and the previous hidden states to the next states with the affine transformation, multiplication operations and a nonlinear activation function, which makes a good data representation for a given task. The affine transformation includes rotation and reflection, which change the semantic or syntactic information of dimensions in the hidden states. However, considering that a model interprets the output sequence of LSTM over the whole input sequence, the dimensions of the states need to keep the same type of semantic or syntactic information regardless of the location in the sequence. In this paper, we propose a simple variant of the LSTM unit, persistent recurrent unit (PRU), where each dimension of hidden states keeps persistent information across time, so that the space keeps the same meaning over the whole sequence. In addition, to improve the nonlinear transformation power, we add a feedforward layer in the PRU structure. In the experiment, we evaluate our proposed methods with three different tasks, and the results confirm that our methods have better performance than the conventional LSTM. |
Tasks | Machine Translation, Speech Recognition |
Published | 2018-06-22 |
URL | http://arxiv.org/abs/1806.08748v2 |
http://arxiv.org/pdf/1806.08748v2.pdf | |
PWC | https://paperswithcode.com/paper/persistent-hidden-states-and-nonlinear |
Repo | |
Framework | |
Visually grounded cross-lingual keyword spotting in speech
Title | Visually grounded cross-lingual keyword spotting in speech |
Authors | Herman Kamper, Michael Roth |
Abstract | Recent work considered how images paired with speech can be used as supervision for building speech systems when transcriptions are not available. We ask whether visual grounding can be used for cross-lingual keyword spotting: given a text keyword in one language, the task is to retrieve spoken utterances containing that keyword in another language. This could enable searching through speech in a low-resource language using text queries in a high-resource language. As a proof-of-concept, we use English speech with German queries: we use a German visual tagger to add keyword labels to each training image, and then train a neural network to map English speech to German keywords. Without seeing parallel speech-transcriptions or translations, the model achieves a precision at ten of 58%. We show that most erroneous retrievals contain equivalent or semantically relevant keywords; excluding these would improve P@10 to 91%. |
Tasks | Keyword Spotting |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.05030v1 |
http://arxiv.org/pdf/1806.05030v1.pdf | |
PWC | https://paperswithcode.com/paper/visually-grounded-cross-lingual-keyword |
Repo | |
Framework | |
Learning Singularity Avoidance
Title | Learning Singularity Avoidance |
Authors | Jeevan Manavalan, Matthew Howard |
Abstract | With the increase in complexity of robotic systems and the rise in non-expert users, it can be assumed that task constraints are not explicitly known. In tasks where avoiding singularity is critical to its success, this paper provides an approach, especially for non-expert users, for the system to learn the constraints contained in a set of demonstrations, such that they can be used to optimise an autonomous controller to avoid singularity, without having to explicitly know the task constraints. The proposed approach avoids singularity, and thereby unpredictable behaviour when carrying out a task, by maximising the learnt manipulability throughout the motion of the constrained system, and is not limited to kinematic systems. Its benefits are demonstrated through comparisons with other control policies which show that the constrained manipulability of a system learnt through demonstration can be used to avoid singularities in cases where these other policies would fail. In the absence of the systems manipulability subject to a tasks constraints, the proposed approach can be used instead to infer these with results showing errors less than 10^-5 in 3DOF simulated systems as well as 10^-2 using a 7DOF real world robotic system. |
Tasks | |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04040v2 |
http://arxiv.org/pdf/1807.04040v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-singularity-avoidance |
Repo | |
Framework | |
When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size)
Title | When to Finish? Optimal Beam Search for Neural Text Generation (modulo beam size) |
Authors | Liang Huang, Kai Zhao, Mingbo Ma |
Abstract | In neural text generation such as neural machine translation, summarization, and image captioning, beam search is widely used to improve the output text quality. However, in the neural generation setting, hypotheses can finish in different steps, which makes it difficult to decide when to end beam search to ensure optimality. We propose a provably optimal beam search algorithm that will always return the optimal-score complete hypothesis (modulo beam size), and finish as soon as the optimality is established (finishing no later than the baseline). To counter neural generation’s tendency for shorter hypotheses, we also introduce a bounded length reward mechanism which allows a modified version of our beam search algorithm to remain optimal. Experiments on neural machine translation demonstrate that our principled beam search algorithm leads to improvement in BLEU score over previously proposed alternatives. |
Tasks | Image Captioning, Machine Translation, Text Generation |
Published | 2018-08-31 |
URL | http://arxiv.org/abs/1809.00069v1 |
http://arxiv.org/pdf/1809.00069v1.pdf | |
PWC | https://paperswithcode.com/paper/when-to-finish-optimal-beam-search-for-neural |
Repo | |
Framework | |
Training Generative Adversarial Networks with Weights
Title | Training Generative Adversarial Networks with Weights |
Authors | Yannis Pantazis, Dipjyoti Paul, Michail Fasoulakis, Yannis Stylianou |
Abstract | The impressive success of Generative Adversarial Networks (GANs) is often overshadowed by the difficulties in their training. Despite the continuous efforts and improvements, there are still open issues regarding their convergence properties. In this paper, we propose a simple training variation where suitable weights are defined and assist the training of the Generator. We provide theoretical arguments why the proposed algorithm is better than the baseline training in the sense of speeding up the training process and of creating a stronger Generator. Performance results showed that the new algorithm is more accurate in both synthetic and image datasets resulting in improvements ranging between 5% and 50%. |
Tasks | |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02598v1 |
http://arxiv.org/pdf/1811.02598v1.pdf | |
PWC | https://paperswithcode.com/paper/training-generative-adversarial-networks-with |
Repo | |
Framework | |