October 18, 2019

3135 words 15 mins read

Paper Group ANR 542

Paper Group ANR 542

Robust and fine-grained prosody control of end-to-end speech synthesis. Intent Generation for Goal-Oriented Dialogue Systems based on Schema.org Annotations. Sample Complexity of Nonparametric Semi-Supervised Learning. Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect. Predicting resonant properties of plas …

Robust and fine-grained prosody control of end-to-end speech synthesis

Title Robust and fine-grained prosody control of end-to-end speech synthesis
Authors Younggun Lee, Taesu Kim
Abstract We propose prosody embeddings for emotional and expressive speech synthesis networks. The proposed methods introduce temporal structures in the embedding networks, thus enabling fine-grained control of the speaking style of the synthesized speech. The temporal structures can be designed either on the speech side or the text side, leading to different control resolutions in time. The prosody embedding networks are plugged into end-to-end speech synthesis networks and trained without any other supervision except for the target speech for synthesizing. It is demonstrated that the prosody embedding networks learned to extract prosodic features. By adjusting the learned prosody features, we could change the pitch and amplitude of the synthesized speech both at the frame level and the phoneme level. We also introduce the temporal normalization of prosody embeddings, which shows better robustness against speaker perturbations during prosody transfer tasks.
Tasks Speech Synthesis
Published 2018-11-06
URL http://arxiv.org/abs/1811.02122v2
PDF http://arxiv.org/pdf/1811.02122v2.pdf
PWC https://paperswithcode.com/paper/robust-and-fine-grained-prosody-control-of
Repo
Framework

Intent Generation for Goal-Oriented Dialogue Systems based on Schema.org Annotations

Title Intent Generation for Goal-Oriented Dialogue Systems based on Schema.org Annotations
Authors Umutcan Şimşek, Dieter Fensel
Abstract Goal-oriented dialogue systems typically communicate with a backend (e.g. database, Web API) to complete certain tasks to reach a goal. The intents that a dialogue system can recognize are mostly included to the system by the developer statically. For an open dialogue system that can work on more than a small set of well curated data and APIs, this manual intent creation will not scalable. In this paper, we introduce a straightforward methodology for intent creation based on semantic annotation of data and services on the web. With this method, the Natural Language Understanding (NLU) module of a goal-oriented dialogue system can adapt to newly introduced APIs without requiring heavy developer involvement. We were able to extract intents and necessary slots to be filled from schema.org annotations. We were also able to create a set of initial training sentences for classifying user utterances into the generated intents. We demonstrate our approach on the NLU module of a state-of-the art dialogue system development framework.
Tasks Goal-Oriented Dialogue Systems
Published 2018-07-03
URL http://arxiv.org/abs/1807.01292v1
PDF http://arxiv.org/pdf/1807.01292v1.pdf
PWC https://paperswithcode.com/paper/intent-generation-for-goal-oriented-dialogue
Repo
Framework

Sample Complexity of Nonparametric Semi-Supervised Learning

Title Sample Complexity of Nonparametric Semi-Supervised Learning
Authors Chen Dan, Liu Leqi, Bryon Aragam, Pradeep Ravikumar, Eric P. Xing
Abstract We study the sample complexity of semi-supervised learning (SSL) and introduce new assumptions based on the mismatch between a mixture model learned from unlabeled data and the true mixture model induced by the (unknown) class conditional distributions. Under these assumptions, we establish an $\Omega(K\log K)$ labeled sample complexity bound without imposing parametric assumptions, where $K$ is the number of classes. Our results suggest that even in nonparametric settings it is possible to learn a near-optimal classifier using only a few labeled samples. Unlike previous theoretical work which focuses on binary classification, we consider general multiclass classification ($K>2$), which requires solving a difficult permutation learning problem. This permutation defines a classifier whose classification error is controlled by the Wasserstein distance between mixing measures, and we provide finite-sample results characterizing the behaviour of the excess risk of this classifier. Finally, we describe three algorithms for computing these estimators based on a connection to bipartite graph matching, and perform experiments to illustrate the superiority of the MLE over the majority vote estimator.
Tasks Graph Matching
Published 2018-09-10
URL http://arxiv.org/abs/1809.03073v1
PDF http://arxiv.org/pdf/1809.03073v1.pdf
PWC https://paperswithcode.com/paper/sample-complexity-of-nonparametric-semi
Repo
Framework

Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect

Title Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect
Authors Xiang Wei, Boqing Gong, Zixia Liu, Wei Lu, Liqiang Wang
Abstract Despite being impactful on a variety of problems and applications, the generative adversarial nets (GANs) are remarkably difficult to train. This issue is formally analyzed by \cite{arjovsky2017towards}, who also propose an alternative direction to avoid the caveats in the minmax two-player training of GANs. The corresponding algorithm, called Wasserstein GAN (WGAN), hinges on the 1-Lipschitz continuity of the discriminator. In this paper, we propose a novel approach to enforcing the Lipschitz continuity in the training procedure of WGANs. Our approach seamlessly connects WGAN with one of the recent semi-supervised learning methods. As a result, it gives rise to not only better photo-realistic samples than the previous methods but also state-of-the-art semi-supervised learning results. In particular, our approach gives rise to the inception score of more than 5.0 with only 1,000 CIFAR-10 images and is the first that exceeds the accuracy of 90% on the CIFAR-10 dataset using only 4,000 labeled images, to the best of our knowledge.
Tasks
Published 2018-03-05
URL http://arxiv.org/abs/1803.01541v1
PDF http://arxiv.org/pdf/1803.01541v1.pdf
PWC https://paperswithcode.com/paper/improving-the-improved-training-of
Repo
Framework

Predicting resonant properties of plasmonic structures by deep learning

Title Predicting resonant properties of plasmonic structures by deep learning
Authors Iman Sajedian, Jeonghyun Kim, Junsuk Rho
Abstract Deep learning can be used to extract meaningful results from images. In this paper, we used convolutional neural networks combined with recurrent neural networks on images of plasmonic structures and extract absorption data form them. To provide the required data for the model we did 100,000 simulations with similar setups and random structures. By designing a deep network we could find a model that could predict the absorption of any structure with similar setup. We used convolutional neural networks to get the spatial information from the images and we used recurrent neural networks to help the model find the relationship between the spatial information obtained from convolutional neural network model. With this design we could reach a very low loss in predicting the absorption compared to the results obtained from numerical simulation in a very short time.
Tasks
Published 2018-04-19
URL http://arxiv.org/abs/1805.00312v1
PDF http://arxiv.org/pdf/1805.00312v1.pdf
PWC https://paperswithcode.com/paper/predicting-resonant-properties-of-plasmonic
Repo
Framework

Improving Information Extraction from Images with Learned Semantic Models

Title Improving Information Extraction from Images with Learned Semantic Models
Authors Stephan Baier, Yunpu Ma, Volker Tresp
Abstract Many applications require an understanding of an image that goes beyond the simple detection and classification of its objects. In particular, a great deal of semantic information is carried in the relationships between objects. We have previously shown that the combination of a visual model and a statistical semantic prior model can improve on the task of mapping images to their associated scene description. In this paper, we review the model and compare it to a novel conditional multi-way model for visual relationship detection, which does not include an explicitly trained visual prior model. We also discuss potential relationships between the proposed methods and memory models of the human brain.
Tasks
Published 2018-08-27
URL http://arxiv.org/abs/1808.08941v1
PDF http://arxiv.org/pdf/1808.08941v1.pdf
PWC https://paperswithcode.com/paper/improving-information-extraction-from-images
Repo
Framework

Modeling OWL with Rules: The ROWL Protege Plugin

Title Modeling OWL with Rules: The ROWL Protege Plugin
Authors Md. Kamruzzaman Sarker, David Carral, Adila A. Krisnadhi, Pascal Hitzler
Abstract In our experience, some ontology users find it much easier to convey logical statements using rules rather than OWL (or description logic) axioms. Based on recent theoretical developments on transformations between rules and description logics, we develop ROWL, a Protege plugin that allows users to enter OWL axioms by way of rules; the plugin then automatically converts these rules into OWL DL axioms if possible, and prompts the user in case such a conversion is not possible without weakening the semantics of the rule.
Tasks
Published 2018-08-30
URL http://arxiv.org/abs/1808.10104v1
PDF http://arxiv.org/pdf/1808.10104v1.pdf
PWC https://paperswithcode.com/paper/modeling-owl-with-rules-the-rowl-protege
Repo
Framework

Generative Adversarial Image Synthesis with Decision Tree Latent Controller

Title Generative Adversarial Image Synthesis with Decision Tree Latent Controller
Authors Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino
Abstract This paper proposes the decision tree latent controller generative adversarial network (DTLC-GAN), an extension of a GAN that can learn hierarchically interpretable representations without relying on detailed supervision. To impose a hierarchical inclusion structure on latent variables, we incorporate a new architecture called the DTLC into the generator input. The DTLC has a multiple-layer tree structure in which the ON or OFF of the child node codes is controlled by the parent node codes. By using this architecture hierarchically, we can obtain the latent space in which the lower layer codes are selectively used depending on the higher layer ones. To make the latent codes capture salient semantic features of images in a hierarchically disentangled manner in the DTLC, we also propose a hierarchical conditional mutual information regularization and optimize it with a newly defined curriculum learning method that we propose as well. This makes it possible to discover hierarchically interpretable representations in a layer-by-layer manner on the basis of information gain by only using a single DTLC-GAN model. We evaluated the DTLC-GAN on various datasets, i.e., MNIST, CIFAR-10, Tiny ImageNet, 3D Faces, and CelebA, and confirmed that the DTLC-GAN can learn hierarchically interpretable representations with either unsupervised or weakly supervised settings. Furthermore, we applied the DTLC-GAN to image-retrieval tasks and showed its effectiveness in representation learning.
Tasks Image Generation, Image Retrieval, Representation Learning
Published 2018-05-27
URL http://arxiv.org/abs/1805.10603v1
PDF http://arxiv.org/pdf/1805.10603v1.pdf
PWC https://paperswithcode.com/paper/generative-adversarial-image-synthesis-with
Repo
Framework

Deep Dive into Anonymity: A Large Scale Analysis of Quora Questions

Title Deep Dive into Anonymity: A Large Scale Analysis of Quora Questions
Authors Binny Mathew, Ritam Dutt, Suman Kalyan Maity, Pawan Goyal, Animesh Mukherjee
Abstract Anonymity forms an integral and important part of our digital life. It enables us to express our true selves without the fear of judgment. In this paper, we investigate the different aspects of anonymity in the social Q&A site Quora. The choice of Quora is motivated by the fact that this is one of the rare social Q&A sites that allow users to explicitly post anonymous questions and such activity in this forum has become normative rather than a taboo. Through an analysis of 5.1 million questions, we observe that at a global scale almost no difference manifests between the linguistic structure of the anonymous and the non-anonymous questions. We find that topical mixing at the global scale to be the primary reason for the absence. However, the differences start to feature once we “deep dive” and (topically) cluster the questions and compare the clusters that have high volumes of anonymous questions with those that have low volumes of anonymous questions. In particular, we observe that the choice to post the question as anonymous is dependent on the user’s perception of anonymity and they often choose to speak about depression, anxiety, social ties and personal issues under the guise of anonymity. We further perform personality trait analysis and observe that the anonymous group of users has positive correlation with extraversion, agreeableness, and negative correlation with openness. Subsequently, to gain further insights, we build an anonymity grid to identify the differences in the perception on anonymity of the user posting the question and the community of users answering it. We also look into the first response time of the questions and observe that it is lowest for topics which talk about personal and sensitive issues, which hints toward a higher degree of community support and user engagement.
Tasks
Published 2018-11-17
URL http://arxiv.org/abs/1811.07223v1
PDF http://arxiv.org/pdf/1811.07223v1.pdf
PWC https://paperswithcode.com/paper/deep-dive-into-anonymity-a-large-scale
Repo
Framework

Online Detection of Action Start in Untrimmed, Streaming Videos

Title Online Detection of Action Start in Untrimmed, Streaming Videos
Authors Zheng Shou, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giro-i-Nieto, Shih-Fu Chang
Abstract We aim to tackle a novel task in action detection - Online Detection of Action Start (ODAS) in untrimmed, streaming videos. The goal of ODAS is to detect the start of an action instance, with high categorization accuracy and low detection latency. ODAS is important in many applications such as early alert generation to allow timely security or emergency response. We propose three novel methods to specifically address the challenges in training ODAS models: (1) hard negative samples generation based on Generative Adversarial Network (GAN) to distinguish ambiguous background, (2) explicitly modeling the temporal consistency between data around action start and data succeeding action start, and (3) adaptive sampling strategy to handle the scarcity of training data. We conduct extensive experiments using THUMOS’14 and ActivityNet. We show that our proposed methods lead to significant performance gains and improve the state-of-the-art methods. An ablation study confirms the effectiveness of each proposed method.
Tasks Action Detection
Published 2018-02-19
URL http://arxiv.org/abs/1802.06822v3
PDF http://arxiv.org/pdf/1802.06822v3.pdf
PWC https://paperswithcode.com/paper/online-detection-of-action-start-in-untrimmed
Repo
Framework

Neuromorphic hardware as a self-organizing computing system

Title Neuromorphic hardware as a self-organizing computing system
Authors Lyes Khacef, Bernard Girau, Nicolas Rougier, Andres Upegui, Benoit Miramond
Abstract This paper presents the self-organized neuromorphic architecture named SOMA. The objective is to study neural-based self-organization in computing systems and to prove the feasibility of a self-organizing hardware structure. Considering that these properties emerge from large scale and fully connected neural maps, we will focus on the definition of a self-organizing hardware architecture based on digital spiking neurons that offer hardware efficiency. From a biological point of view, this corresponds to a combination of the so-called synaptic and structural plasticities. We intend to define computational models able to simultaneously self-organize at both computation and communication levels, and we want these models to be hardware-compliant, fault tolerant and scalable by means of a neuro-cellular structure.
Tasks
Published 2018-10-30
URL http://arxiv.org/abs/1810.12640v1
PDF http://arxiv.org/pdf/1810.12640v1.pdf
PWC https://paperswithcode.com/paper/neuromorphic-hardware-as-a-self-organizing
Repo
Framework

Self-Learning to Detect and Segment Cysts in Lung CT Images without Manual Annotation

Title Self-Learning to Detect and Segment Cysts in Lung CT Images without Manual Annotation
Authors Ling Zhang, Vissagan Gopalakrishnan, Le Lu, Ronald M. Summers, Joel Moss, Jianhua Yao
Abstract Image segmentation is a fundamental problem in medical image analysis. In recent years, deep neural networks achieve impressive performances on many medical image segmentation tasks by supervised learning on large manually annotated data. However, expert annotations on big medical datasets are tedious, expensive or sometimes unavailable. Weakly supervised learning could reduce the effort for annotation but still required certain amounts of expertise. Recently, deep learning shows a potential to produce more accurate predictions than the original erroneous labels. Inspired by this, we introduce a very weakly supervised learning method, for cystic lesion detection and segmentation in lung CT images, without any manual annotation. Our method works in a self-learning manner, where segmentation generated in previous steps (first by unsupervised segmentation then by neural networks) is used as ground truth for the next level of network learning. Experiments on a cystic lung lesion dataset show that the deep learning could perform better than the initial unsupervised annotation, and progressively improve itself after self-learning.
Tasks Medical Image Segmentation, Semantic Segmentation
Published 2018-01-25
URL http://arxiv.org/abs/1801.08486v1
PDF http://arxiv.org/pdf/1801.08486v1.pdf
PWC https://paperswithcode.com/paper/self-learning-to-detect-and-segment-cysts-in
Repo
Framework

Learning Sports Camera Selection from Internet Videos

Title Learning Sports Camera Selection from Internet Videos
Authors Jianhui Chen, Keyu Lu, Sijia Tian, James J. Little
Abstract This work addresses camera selection, the task of predicting which camera should be “on air” from multiple candidate cameras for soccer broadcast. The task is challenging because of the scarcity of learning data with all candidate views. Meanwhile, broadcast videos are freely available on the Internet (e.g. Youtube). However, these videos only record the selected camera views, omitting the other candidate views. To overcome this problem, we first introduce a random survival forest (RSF) method to impute the incomplete data effectively. Then, we propose a spatial-appearance heatmap to describe foreground objects (e.g. players and balls) in an image. To evaluate the performance of our system, we collect the largest-ever dataset for soccer broadcasting camera selection. It has one main game which has all candidate views and twelve auxiliary games which only have the broadcast view. Our method significantly outperforms state-of-the-art methods on this challenging dataset. Further analysis suggests that the improvement in performance is indeed from the extra information from auxiliary games.
Tasks
Published 2018-09-08
URL http://arxiv.org/abs/1809.02854v1
PDF http://arxiv.org/pdf/1809.02854v1.pdf
PWC https://paperswithcode.com/paper/learning-sports-camera-selection-from
Repo
Framework

DCSVM: Fast Multi-class Classification using Support Vector Machines

Title DCSVM: Fast Multi-class Classification using Support Vector Machines
Authors Duleep Rathgamage Don, Ionut E. Iacob
Abstract We present DCSVM, an efficient algorithm for multi-class classification using Support Vector Machines. DCSVM is a divide and conquer algorithm which relies on data sparsity in high dimensional space and performs a smart partitioning of the whole training data set into disjoint subsets that are easily separable. A single prediction performed between two partitions eliminates at once one or more classes in one partition, leaving only a reduced number of candidate classes for subsequent steps. The algorithm continues recursively, reducing the number of classes at each step, until a final binary decision is made between the last two classes left in the competition. In the best case scenario, our algorithm makes a final decision between $k$ classes in $O(\log k)$ decision steps and in the worst case scenario DCSVM makes a final decision in $k-1$ steps, which is not worse than the existent techniques.
Tasks
Published 2018-10-23
URL http://arxiv.org/abs/1810.09828v1
PDF http://arxiv.org/pdf/1810.09828v1.pdf
PWC https://paperswithcode.com/paper/dcsvm-fast-multi-class-classification-using
Repo
Framework

The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making

Title The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making
Authors Luchen Li, Matthieu Komorowski, Aldo A. Faisal
Abstract Off-policy reinforcement learning enables near-optimal policy from suboptimal experience, thereby provisions opportunity for artificial intelligence applications in healthcare. Previous works have mainly framed patient-clinician interactions as Markov decision processes, while true physiological states are not necessarily fully observable from clinical data. We capture this situation with partially observable Markov decision process, in which an agent optimises its actions in a belief represented as a distribution of patient states inferred from individual history trajectories. A Gaussian mixture model is fitted for the observed data. Moreover, we take into account the fact that nuance in pharmaceutical dosage could presumably result in significantly different effect by modelling a continuous policy through a Gaussian approximator directly in the policy space, i.e. the actor. To address the challenge of infinite number of possible belief states which renders exact value iteration intractable, we evaluate and plan for only every encountered belief, through heuristic search tree by tightly maintaining lower and upper bounds of the true value of belief. We further resort to function approximations to update value bounds estimation, i.e. the critic, so that the tree search can be improved through more compact bounds at the fringe nodes that will be back-propagated to the root. Both actor and critic parameters are learned via gradient-based approaches. Our proposed policy trained from real intensive care unit data is capable of dictating dosing on vasopressors and intravenous fluids for sepsis patients that lead to the best patient outcomes.
Tasks Decision Making
Published 2018-05-29
URL http://arxiv.org/abs/1805.11548v3
PDF http://arxiv.org/pdf/1805.11548v3.pdf
PWC https://paperswithcode.com/paper/the-actor-search-tree-critic-astc-for-off
Repo
Framework
comments powered by Disqus