Paper Group ANR 1636
“Did You Hear That?” Learning to Play Video Games from Audio Cues. NEARBY Platform for Automatic Asteroids Detection and EURONEAR Surveys. Robust subspace clustering by Cauchy loss function. Comparing two deep learning sequence-based models for protein-protein interaction prediction. Zygarde: Time-Sensitive On-Device Deep Intelligence on Intermitte …
“Did You Hear That?” Learning to Play Video Games from Audio Cues
Title | “Did You Hear That?” Learning to Play Video Games from Audio Cues |
Authors | Raluca D. Gaina, Matthew Stephenson |
Abstract | Game-playing AI research has focused for a long time on learning to play video games from visual input or symbolic information. However, humans benefit from a wider array of sensors which we utilise in order to navigate the world around us. In particular, sounds and music are key to how many of us perceive the world and influence the decisions we make. In this paper, we present initial experiments on game-playing agents learning to play video games solely from audio cues. We expand the Video Game Description Language to allow for audio specification, and the General Video Game AI framework to provide new audio games and an API for learning agents to make use of audio observations. We analyse the games and the audio game design process, include initial results with simple Q~Learning agents, and encourage further research in this area. |
Tasks | Q-Learning |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04027v2 |
https://arxiv.org/pdf/1906.04027v2.pdf | |
PWC | https://paperswithcode.com/paper/did-you-hear-that-learning-to-play-video |
Repo | |
Framework | |
NEARBY Platform for Automatic Asteroids Detection and EURONEAR Surveys
Title | NEARBY Platform for Automatic Asteroids Detection and EURONEAR Surveys |
Authors | Dorian Gorgan, Ovidiu Vaduvescu, Teodor Stefanut, Victor Bacu, Adrian Sabou, Denisa Copandean Balazs, Constantin Nandra, Costin Boldea, Afrodita Boldea, Marian Predatu, Viktoria Pinter, Adrian Stanica |
Abstract | The survey of the nearby space and continuous monitoring of the Near Earth Objects (NEOs) and especially Near Earth Asteroids (NEAs) are essential for the future of our planet and should represent a priority for our solar system research and nearby space exploration. More computing power and sophisticated digital tracking algorithms are needed to cope with the larger astronomy imaging cameras dedicated for survey telescopes. The paper presents the NEARBY platform that aims to experiment new algorithms for automatic image reduction, detection and validation of moving objects in astronomical surveys, specifically NEAs. The NEARBY platform has been developed and experimented through a collaborative research work between the Technical University of Cluj-Napoca (UTCN) and the University of Craiova, Romania, using observing infrastructure of the Instituto de Astrofisica de Canarias (IAC) and Isaac Newton Group (ING), La Palma, Spain. The NEARBY platform has been developed and deployed on the UTCN’s cloud infrastructure and the acquired images are processed remotely by the astronomers who transfer it from ING through the web interface of the NEARBY platform. The paper analyzes and highlights the main aspects of the NEARBY platform development, and the results and conclusions on the EURONEAR surveys. |
Tasks | |
Published | 2019-03-08 |
URL | http://arxiv.org/abs/1903.03479v1 |
http://arxiv.org/pdf/1903.03479v1.pdf | |
PWC | https://paperswithcode.com/paper/nearby-platform-for-automatic-asteroids |
Repo | |
Framework | |
Robust subspace clustering by Cauchy loss function
Title | Robust subspace clustering by Cauchy loss function |
Authors | Xuelong Li, Quanmao Lu, Yongsheng Dong, Dacheng Tao |
Abstract | Subspace clustering is a problem of exploring the low-dimensional subspaces of high-dimensional data. State-of-the-arts approaches are designed by following the model of spectral clustering based method. These methods pay much attention to learn the representation matrix to construct a suitable similarity matrix and overlook the influence of the noise term on subspace clustering. However, the real data are always contaminated by the noise and the noise usually has a complicated statistical distribution. To alleviate this problem, we in this paper propose a subspace clustering method based on Cauchy loss function (CLF). Particularly, it uses CLF to penalize the noise term for suppressing the large noise mixed in the real data. This is due to that the CLF’s influence function has a upper bound which can alleviate the influence of a single sample, especially the sample with a large noise, on estimating the residuals. Furthermore, we theoretically prove the grouping effect of our proposed method, which means that highly correlated data can be grouped together. Finally, experimental results on five real datasets reveal that our proposed method outperforms several representative clustering methods. |
Tasks | |
Published | 2019-04-28 |
URL | http://arxiv.org/abs/1904.12274v1 |
http://arxiv.org/pdf/1904.12274v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-subspace-clustering-by-cauchy-loss |
Repo | |
Framework | |
Comparing two deep learning sequence-based models for protein-protein interaction prediction
Title | Comparing two deep learning sequence-based models for protein-protein interaction prediction |
Authors | Florian Richoux, Charlène Servantie, Cynthia Borès, Stéphane Téletchéa |
Abstract | Biological data are extremely diverse, complex but also quite sparse. The recent developments in deep learning methods are offering new possibilities for the analysis of complex data. However, it is easy to be get a deep learning model that seems to have good results but is in fact either overfitting the training data or the validation data. In particular, the fact to overfit the validation data, called “information leak”, is almost never treated in papers proposing deep learning models to predict protein-protein interactions (PPI). In this work, we compare two carefully designed deep learning models and show pitfalls to avoid while predicting PPIs through machine learning methods. Our best model predicts accurately more than 78% of human PPI, in very strict conditions both for training and testing. The methodology we propose here allow us to have strong confidences about the ability of a model to scale up on larger datasets. This would allow sharper models when larger datasets would be available, rather than current models prone to information leaks. Our solid methodological foundations shall be applicable to more organisms and whole proteome networks predictions. |
Tasks | |
Published | 2019-01-15 |
URL | http://arxiv.org/abs/1901.06268v1 |
http://arxiv.org/pdf/1901.06268v1.pdf | |
PWC | https://paperswithcode.com/paper/comparing-two-deep-learning-sequence-based |
Repo | |
Framework | |
Zygarde: Time-Sensitive On-Device Deep Intelligence on Intermittently-Powered Systems
Title | Zygarde: Time-Sensitive On-Device Deep Intelligence on Intermittently-Powered Systems |
Authors | Bashima Islam, Yubo Luo, Shahriar Nirjon |
Abstract | In this paper, we propose a time-, energy-, and accuracy-aware scheduling algorithm for intermittently powered systems that execute compressed deep learning tasks that are suitable for MCUs and are powered solely by harvested energy. The sporadic nature of harvested energy, resource constraints of the embedded platform, and the computational demand of deep neural networks (even though compressed) pose a unique and challenging real-time scheduling problem for which no solutions have been proposed in the literature. We empirically study the problem and model the energy harvesting pattern as well as the trade-off between the accuracy and execution of a deep neural network. We develop an imprecise computing-based scheduling algorithm that improves the schedulability of deep learning tasks on intermittently powered systems. We also utilize the dependency of the computational need of data samples for deep learning models and propose early termination of deep neural networks. We further propose a semi-supervised machine learning model that exploits the deep features and contributes in determining the imprecise partition of a task. We implement our proposed algorithms on two different datasets and real-life scenarios and show that it increases the accuracy by 9.45% - 3.19%, decreases the execution time by 14% and successfully schedules 33%-12% more tasks. |
Tasks | |
Published | 2019-05-05 |
URL | https://arxiv.org/abs/1905.03854v1 |
https://arxiv.org/pdf/1905.03854v1.pdf | |
PWC | https://paperswithcode.com/paper/190503854 |
Repo | |
Framework | |
The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach
Title | The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach |
Authors | Noé Tits, Kevin El Haddad, Thierry Dutoit |
Abstract | As part of the Human-Computer Interaction field, Expressive speech synthesis is a very rich domain as it requires knowledge in areas such as machine learning, signal processing, sociology, psychology. In this Chapter, we will focus mostly on the technical side. From the recording of expressive speech to its modeling, the reader will have an overview of the main paradigms used in this field, through some of the most prominent systems and methods. We explain how speech can be represented and encoded with audio features. We present a history of the main methods of Text-to-Speech synthesis: concatenative, parametric and statistical parametric speech synthesis. Finally, we focus on the last one, with the last techniques modeling Text-to-Speech synthesis as a sequence-to-sequence problem. This enables the use of Deep Learning blocks such as Convolutional and Recurrent Neural Networks as well as Attention Mechanism. The last part of the Chapter intends to assemble the different aspects of the theory and summarize the concepts. |
Tasks | Speech Synthesis, Text-To-Speech Synthesis |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06234v1 |
https://arxiv.org/pdf/1910.06234v1.pdf | |
PWC | https://paperswithcode.com/paper/the-theory-behind-controllable-expressive |
Repo | |
Framework | |
Deep Self-Learning From Noisy Labels
Title | Deep Self-Learning From Noisy Labels |
Authors | Jiangfan Han, Ping Luo, Xiaogang Wang |
Abstract | ConvNets achieve good results when training from clean data, but learning from noisy labels significantly degrades performances and remains challenging. Unlike previous works constrained by many conditions, making them infeasible to real noisy cases, this work presents a novel deep self-learning framework to train a robust network on the real noisy datasets without extra supervision. The proposed approach has several appealing benefits. (1) Different from most existing work, it does not rely on any assumption on the distribution of the noisy labels, making it robust to real noises. (2) It does not need extra clean supervision or accessorial network to help training. (3) A self-learning framework is proposed to train the network in an iterative end-to-end manner, which is effective and efficient. Extensive experiments in challenging benchmarks such as Clothing1M and Food101-N show that our approach outperforms its counterparts in all empirical settings. |
Tasks | Image Classification |
Published | 2019-08-06 |
URL | https://arxiv.org/abs/1908.02160v2 |
https://arxiv.org/pdf/1908.02160v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-self-learning-from-noisy-labels |
Repo | |
Framework | |
Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?
Title | Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models? |
Authors | Lyan Verwimp, Jerome R. Bellegarda |
Abstract | Natural language processing (NLP) tasks tend to suffer from a paucity of suitably annotated training data, hence the recent success of transfer learning across a wide variety of them. The typical recipe involves: (i) training a deep, possibly bidirectional, neural network with an objective related to language modeling, for which training data is plentiful; and (ii) using the trained network to derive contextual representations that are far richer than standard linear word embeddings such as word2vec, and thus result in important gains. In this work, we wonder whether the opposite perspective is also true: can contextual representations trained for different NLP tasks improve language modeling itself? Since language models (LMs) are predominantly locally optimized, other NLP tasks may help them make better predictions based on the entire semantic fabric of a document. We test the performance of several types of pre-trained embeddings in neural LMs, and we investigate whether it is possible to make the LM more aware of global semantic information through embeddings pre-trained with a domain classification model. Initial experiments suggest that as long as the proper objective criterion is used during training, pre-trained embeddings are likely to be beneficial for neural language modeling. |
Tasks | Language Modelling, Transfer Learning, Word Embeddings |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.04130v1 |
https://arxiv.org/pdf/1909.04130v1.pdf | |
PWC | https://paperswithcode.com/paper/reverse-transfer-learning-can-word-embeddings |
Repo | |
Framework | |
Effective Incorporation of Speaker Information in Utterance Encoding in Dialog
Title | Effective Incorporation of Speaker Information in Utterance Encoding in Dialog |
Authors | Tianyu Zhao, Tatsuya Kawahara |
Abstract | In dialog studies, we often encode a dialog using a hierarchical encoder where each utterance is converted into an utterance vector, and then a sequence of utterance vectors is converted into a dialog vector. Since knowing who produced which utterance is essential to understanding a dialog, conventional methods tried integrating speaker labels into utterance vectors. We found the method problematic in some cases where speaker annotations are inconsistent among different dialogs. A relative speaker modeling method is proposed to address the problem. Experimental evaluations on dialog act recognition and response generation show that the proposed method yields superior and more consistent performances. |
Tasks | |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05599v1 |
https://arxiv.org/pdf/1907.05599v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-incorporation-of-speaker |
Repo | |
Framework | |
Embedding Human Knowledge into Deep Neural Network via Attention Map
Title | Embedding Human Knowledge into Deep Neural Network via Attention Map |
Authors | Masahiro Mitsuhara, Hiroshi Fukui, Yusuke Sakashita, Takanori Ogata, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi |
Abstract | In this work, we aim to realize a method for embedding human knowledge into deep neural networks. While the conventional method to embed human knowledge has been applied for non-deep machine learning, it is challenging to apply it for deep learning models due to the enormous number of model parameters. To tackle this problem, we focus on the attention mechanism of an attention branch network (ABN). In this paper, we propose a fine-tuning method that utilizes a single-channel attention map which is manually edited by a human expert. Our fine-tuning method can train a network so that the output attention map corresponds to the edited ones. As a result, the fine-tuned network can output an attention map that takes into account human knowledge. Experimental results with ImageNet, CUB-200-2010, and IDRiD demonstrate that it is possible to obtain a clear attention map for a visual explanation and improve the classification performance. Our findings can be a novel framework for optimizing networks through human intuitive editing via a visual interface and suggest new possibilities for human-machine cooperation in addition to the improvement of visual explanations. |
Tasks | |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03540v4 |
https://arxiv.org/pdf/1905.03540v4.pdf | |
PWC | https://paperswithcode.com/paper/190503540 |
Repo | |
Framework | |
Learning to Synthesize Fashion Textures
Title | Learning to Synthesize Fashion Textures |
Authors | Wu Shi, Tak-Wai Hui, Ziwei Liu, Dahua Lin, Chen Change Loy |
Abstract | Existing unconditional generative models mainly focus on modeling general objects, such as faces and indoor scenes. Fashion textures, another important type of visual elements around us, have not been extensively studied. In this work, we propose an effective generative model for fashion textures and also comprehensively investigate the key components involved: internal representation, latent space sampling and the generator architecture. We use Gram matrix as a suitable internal representation for modeling realistic fashion textures, and further design two dedicated modules for modulating Gram matrix into a low-dimension vector. Since fashion textures are scale-dependent, we propose a recursive auto-encoder to capture the dependency between multiple granularity levels of texture feature. Another important observation is that fashion textures are multi-modal. We fit and sample from a Gaussian mixture model in the latent space to improve the diversity of the generated textures. Extensive experiments demonstrate that our approach is capable of synthesizing more realistic and diverse fashion textures over other state-of-the-art methods. |
Tasks | |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07472v1 |
https://arxiv.org/pdf/1911.07472v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-synthesize-fashion-textures |
Repo | |
Framework | |
Air-Writing Translater: A Novel Unsupervised Domain Adaptation Method for Inertia-Trajectory Translation of In-air Handwriting
Title | Air-Writing Translater: A Novel Unsupervised Domain Adaptation Method for Inertia-Trajectory Translation of In-air Handwriting |
Authors | Songbin Xu, Yang Xue, Xin Zhang, Lianwen Jin |
Abstract | As a new way of human-computer interaction, inertial sensor based in-air handwriting can provide a natural and unconstrained interaction to express more complex and richer information in 3D space. However, most of the existing in-air handwriting work is mainly focused on handwritten character recognition, which makes these work suffer from poor readability of inertial signal and lack of labeled samples. To address these two problems, we use unsupervised domain adaptation method to reconstruct the trajectory of inertial signal and generate inertial samples using online handwritten trajectories. In this paper, we propose an AirWriting Translater model to learn the bi-directional translation between trajectory domain and inertial domain in the absence of paired inertial and trajectory samples. Through semantic-level adversarial training and latent classification loss, the proposed model learns to extract domain-invariant content between inertial signal and trajectory, while preserving semantic consistency during the translation across the two domains. We carefully design the architecture, so that the proposed framework can accept inputs of arbitrary length and translate between different sampling rates. We also conduct experiments on two public datasets: 6DMG (in-air handwriting dataset) and CT (handwritten trajectory dataset), the results on the two datasets demonstrate that the proposed network successes in both Inertia-to Trajectory and Trajectory-to-Inertia translation tasks. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2019-11-01 |
URL | https://arxiv.org/abs/1911.05649v1 |
https://arxiv.org/pdf/1911.05649v1.pdf | |
PWC | https://paperswithcode.com/paper/air-writing-translater-a-novel-unsupervised |
Repo | |
Framework | |
Multi-Task Self-Supervised Learning for Disfluency Detection
Title | Multi-Task Self-Supervised Learning for Disfluency Detection |
Authors | Shaolei Wang, Wanxiang Che, Qi Liu, Pengda Qin, Ting Liu, William Yang Wang |
Abstract | Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard. |
Tasks | Sentence Classification |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05378v1 |
https://arxiv.org/pdf/1908.05378v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-self-supervised-learning-for |
Repo | |
Framework | |
Human-like machine thinking: Language guided imagination
Title | Human-like machine thinking: Language guided imagination |
Authors | Feng Qi, Wenchuan Wu |
Abstract | Human thinking requires the brain to understand the meaning of language expression and to properly organize the thoughts flow using the language. However, current natural language processing models are primarily limited in the word probability estimation. Here, we proposed a Language guided imagination (LGI) network to incrementally learn the meaning and usage of numerous words and syntaxes, aiming to form a human-like machine thinking process. LGI contains three subsystems: (1) vision system that contains an encoder to disentangle the input or imagined scenarios into abstract population representations, and an imagination decoder to reconstruct imagined scenario from higher level representations; (2) Language system, that contains a binarizer to transfer symbol texts into binary vectors, an IPS (mimicking the human IntraParietal Sulcus, implemented by an LSTM) to extract the quantity information from the input texts, and a textizer to convert binary vectors into text symbols; (3) a PFC (mimicking the human PreFrontal Cortex, implemented by an LSTM) to combine inputs of both language and vision representations, and predict text symbols and manipulated images accordingly. LGI has incrementally learned eight different syntaxes (or tasks), with which a machine thinking loop has been formed and validated by the proper interaction between language and vision system. The paper provides a new architecture to let the machine learn, understand and use language in a human-like way that could ultimately enable a machine to construct fictitious ‘mental’ scenario and possess intelligence. |
Tasks | |
Published | 2019-05-18 |
URL | https://arxiv.org/abs/1905.07562v2 |
https://arxiv.org/pdf/1905.07562v2.pdf | |
PWC | https://paperswithcode.com/paper/human-like-machine-thinking-language-guided |
Repo | |
Framework | |
InceptionGCN: Receptive Field Aware Graph Convolutional Network for Disease Prediction
Title | InceptionGCN: Receptive Field Aware Graph Convolutional Network for Disease Prediction |
Authors | Anees Kazi, Shayan shekarforoush, S. Arvind krishna, Hendrik Burwinkel, Gerome Vivar, Karsten Kortuem, Seyed-Ahmad Ahmadi, Shadi Albarqouni, Nassir Navab |
Abstract | Geometric deep learning provides a principled and versatile manner for the integration of imaging and non-imaging modalities in the medical domain. Graph Convolutional Networks (GCNs) in particular have been explored on a wide variety of problems such as disease prediction, segmentation, and matrix completion by leveraging large, multimodal datasets. In this paper, we introduce a new spectral domain architecture for deep learning on graphs for disease prediction. The novelty lies in defining geometric ‘inception modules’ which are capable of capturing intra- and inter-graph structural heterogeneity during convolutions. We design filters with different kernel sizes to build our architecture. We show our disease prediction results on two publicly available datasets. Further, we provide insights on the behaviour of regular GCNs and our proposed model under varying input scenarios on simulated data. |
Tasks | Disease Prediction, Matrix Completion |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04233v1 |
http://arxiv.org/pdf/1903.04233v1.pdf | |
PWC | https://paperswithcode.com/paper/inceptiongcn-receptive-field-aware-graph |
Repo | |
Framework | |