January 27, 2020

3162 words 15 mins read

Paper Group ANR 1242

PuVAE: A Variational Autoencoder to Purify Adversarial Examples. Topology-preserving augmentation for CNN-based segmentation of congenital heart defects from 3D paediatric CMR. What relations are reliably embeddable in Euclidean space?. Exploring Multilingual Syntactic Sentence Representations. Two-Stream Multi-Channel Convolutional Neural Network …

PuVAE: A Variational Autoencoder to Purify Adversarial Examples


Title	PuVAE: A Variational Autoencoder to Purify Adversarial Examples
Authors	Uiwon Hwang, Jaewoo Park, Hyemi Jang, Sungroh Yoon, Nam Ik Cho
Abstract	Deep neural networks are widely used and exhibit excellent performance in many areas. However, they are vulnerable to adversarial attacks that compromise the network at the inference time by applying elaborately designed perturbation to input data. Although several defense methods have been proposed to address specific attacks, other attack methods can circumvent these defense mechanisms. Therefore, we propose Purifying Variational Autoencoder (PuVAE), a method to purify adversarial examples. The proposed method eliminates an adversarial perturbation by projecting an adversarial example on the manifold of each class, and determines the closest projection as a purified sample. We experimentally illustrate the robustness of PuVAE against various attack methods without any prior knowledge. In our experiments, the proposed method exhibits performances competitive with state-of-the-art defense methods, and the inference time is approximately 130 times faster than that of Defense-GAN that is the state-of-the art purifier model.
Tasks
Published	2019-03-02
URL	http://arxiv.org/abs/1903.00585v1
PDF	http://arxiv.org/pdf/1903.00585v1.pdf
PWC	https://paperswithcode.com/paper/puvae-a-variational-autoencoder-to-purify
Repo
Framework

Topology-preserving augmentation for CNN-based segmentation of congenital heart defects from 3D paediatric CMR


Title	Topology-preserving augmentation for CNN-based segmentation of congenital heart defects from 3D paediatric CMR
Authors	Nick Byrne, James R. Clough, Isra Valverde, Giovanni Montana, Andrew P. King
Abstract	Patient-specific 3D printing of congenital heart anatomy demands an accurate segmentation of the thin tissue interfaces which characterise these diagnoses. Even when a label set has a high spatial overlap with the ground truth, inaccurate delineation of these interfaces can result in topological errors. These compromise the clinical utility of such models due to the anomalous appearance of defects. CNNs have achieved state-of-the-art performance in segmentation tasks. Whilst data augmentation has often played an important role, we show that conventional image resampling schemes used therein can introduce topological changes in the ground truth labelling of augmented samples. We present a novel pipeline to correct for these changes, using a fast-marching algorithm to enforce the topology of the ground truth labels within their augmented representations. In so doing, we invoke the idea of cardiac contiguous topology to describe an arbitrary combination of congenital heart defects and develop an associated, clinically meaningful metric to measure the topological correctness of segmentations. In a series of five-fold cross-validations, we demonstrate the performance gain produced by this pipeline and the relevance of topological considerations to the segmentation of congenital heart defects. We speculate as to the applicability of this approach to any segmentation task involving morphologically complex targets.
Tasks	Data Augmentation
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08870v1
PDF	https://arxiv.org/pdf/1908.08870v1.pdf
PWC	https://paperswithcode.com/paper/topology-preserving-augmentation-for-cnn
Repo
Framework

What relations are reliably embeddable in Euclidean space?


Title	What relations are reliably embeddable in Euclidean space?
Authors	Robi Bhattacharjee, Sanjoy Dasgupta
Abstract	We consider the problem of embedding a relation, represented as a directed graph, into Euclidean space. For three types of embeddings motivated by the recent literature on knowledge graphs, we obtain characterizations of which relations they are able to capture, as well as bounds on the minimal dimensionality and precision needed.
Tasks	Knowledge Graphs
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05347v1
PDF	http://arxiv.org/pdf/1903.05347v1.pdf
PWC	https://paperswithcode.com/paper/what-relations-are-reliably-embeddable-in
Repo
Framework

Exploring Multilingual Syntactic Sentence Representations


Title	Exploring Multilingual Syntactic Sentence Representations
Authors	Chen Liu, Anderson de Andrade, Muhammad Osama
Abstract	We study methods for learning sentence embeddings with syntactic structure. We focus on methods of learning syntactic sentence-embeddings by using a multilingual parallel-corpus augmented by Universal Parts-of-Speech tags. We evaluate the quality of the learned embeddings by examining sentence-level nearest neighbours and functional dissimilarity in the embedding space. We also evaluate the ability of the method to learn syntactic sentence-embeddings for low-resource languages and demonstrate strong evidence for transfer learning. Our results show that syntactic sentence-embeddings can be learned while using less training data, fewer model parameters, and resulting in better evaluation metrics than state-of-the-art language models.
Tasks	Sentence Embeddings, Transfer Learning
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11768v1
PDF	https://arxiv.org/pdf/1910.11768v1.pdf
PWC	https://paperswithcode.com/paper/exploring-multilingual-syntactic-sentence
Repo
Framework

Two-Stream Multi-Channel Convolutional Neural Network (TM-CNN) for Multi-Lane Traffic Speed Prediction Considering Traffic Volume Impact


Title	Two-Stream Multi-Channel Convolutional Neural Network (TM-CNN) for Multi-Lane Traffic Speed Prediction Considering Traffic Volume Impact
Authors	Ruimin Ke, Wan Li, Zhiyong Cui, Yinhai Wang
Abstract	Traffic speed prediction is a critically important component of intelligent transportation systems (ITS). Recently, with the rapid development of deep learning and transportation data science, a growing body of new traffic speed prediction models have been designed, which achieved high accuracy and large-scale prediction. However, existing studies have two major limitations. First, they predict aggregated traffic speed rather than lane-level traffic speed; second, most studies ignore the impact of other traffic flow parameters in speed prediction. To address these issues, we propose a two-stream multi-channel convolutional neural network (TM-CNN) model for multi-lane traffic speed prediction considering traffic volume impact. In this model, we first introduce a new data conversion method that converts raw traffic speed data and volume data into spatial-temporal multi-channel matrices. Then we carefully design a two-stream deep neural network to effectively learn the features and correlations between individual lanes, in the spatial-temporal dimensions, and between speed and volume. Accordingly, a new loss function that considers the volume impact in speed prediction is developed. A case study using one-year data validates the TM-CNN model and demonstrates its superiority. This paper contributes to two research areas: (1) traffic speed prediction, and (2) multi-lane traffic flow study.
Tasks
Published	2019-03-05
URL	http://arxiv.org/abs/1903.01678v1
PDF	http://arxiv.org/pdf/1903.01678v1.pdf
PWC	https://paperswithcode.com/paper/two-stream-multi-channel-convolutional-neural
Repo
Framework

Collaborating with Users in Proximity for Decentralized Mobile Recommender Systems


Title	Collaborating with Users in Proximity for Decentralized Mobile Recommender Systems
Authors	Felix Beierle, Tobias Eichinger
Abstract	Typically, recommender systems from any domain, be it movies, music, restaurants, etc., are organized in a centralized fashion. The service provider holds all the data, biases in the recommender algorithms are not transparent to the user, and the service providers often create lock-in effects making it inconvenient for the user to switch providers. In this paper, we argue that the user’s smartphone already holds a lot of the data that feeds into typical recommender systems for movies, music, or POIs. With the ubiquity of the smartphone and other users in proximity in public places or public transportation, data can be exchanged directly between users in a device-to-device manner. This way, each smartphone can build its own database and calculate its own recommendations. One of the benefits of such a system is that it is not restricted to recommendations for just one user - ad-hoc group recommendations are also possible. While the infrastructure for such a platform already exists - the smartphones already in the palms of the users - there are challenges both with respect to the mobile recommender system platform as well as to its recommender algorithms. In this paper, we present a mobile architecture for the described system - consisting of data collection, data exchange, and recommender system - and highlight its challenges and opportunities.
Tasks	Recommendation Systems
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03114v1
PDF	https://arxiv.org/pdf/1906.03114v1.pdf
PWC	https://paperswithcode.com/paper/collaborating-with-users-in-proximity-for
Repo
Framework

MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation


Title	MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation
Authors	Xia Liang, Junmin Wu, Yan Yin
Abstract	Most existing neural network models for music generation explore how to generate music bars, then directly splice the music bars into a song. However, these methods do not explore the relationship between the bars, and the connected song as a whole has no musical form structure and sense of musical direction. To address this issue, we propose a Multi-model Multi-task Hierarchical Conditional VAE-GAN (Variational Autoencoder-Generative adversarial networks) networks, named MIDI-Sandwich, which combines musical knowledge, such as musical form, tonic, and melodic motion. The MIDI-Sandwich has two submodels: Hierarchical Conditional Variational Autoencoder (HCVAE) and Hierarchical Conditional Generative Adversarial Network (HCGAN). The HCVAE uses hierarchical structure. The underlying layer of HCVAE uses Local Conditional Variational Autoencoder (L-CVAE) to generate a music bar which is pre-specified by the First and Last Notes (FLN). The upper layer of HCVAE uses Global Variational Autoencoder(G-VAE) to analyze the latent vector sequence generated by the L-CVAE encoder, to explore the musical relationship between the bars, and to produce the song pieced together by multiple music bars generated by the L-CVAE decoder, which makes the song both have musical structure and sense of direction. At the same time, the HCVAE shares a part of itself with the HCGAN to further improve the performance of the generated music. The MIDI-Sandwich is validated on the Nottingham dataset and is able to generate a single-track melody sequence (17x8 beats), which is superior to the length of most of the generated models (8 to 32 beats). Meanwhile, by referring to the experimental methods of many classical kinds of literature, the quality evaluation of the generated music is performed. The above experiments prove the validity of the model.
Tasks	Music Generation
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01607v2
PDF	https://arxiv.org/pdf/1907.01607v2.pdf
PWC	https://paperswithcode.com/paper/midi-sandwich-multi-model-multi-task
Repo
Framework

A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning


Title	A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning
Authors	Ahmad Asadi, Reza Safabakhsh
Abstract	Generating textual descriptions for images has been an attractive problem for the computer vision and natural language processing researchers in recent years. Dozens of models based on deep learning have been proposed to solve this problem. The existing approaches are based on neural encoder-decoder structures equipped with the attention mechanism. These methods strive to train decoders to minimize the log likelihood of the next word in a sentence given the previous ones, which results in the sparsity of the output space. In this work, we propose a new approach to train decoders to regress the word embedding of the next word with respect to the previous ones instead of minimizing the log likelihood. The proposed method is able to learn and extract long-term information and can generate longer fine-grained captions without introducing any external memory cell. Furthermore, decoders trained by the proposed technique can take the importance of the generated words into consideration while generating captions. In addition, a novel semantic attention mechanism is proposed that guides attention points through the image, taking the meaning of the previously generated word into account. We evaluate the proposed approach with the MS-COCO dataset. The proposed model outperformed the state of the art models especially in generating longer captions. It achieved a CIDEr score equal to 125.0 and a BLEU-4 score equal to 50.5, while the best scores of the state of the art models are 117.1 and 48.0, respectively.
Tasks	Image Captioning
Published	2019-06-26
URL	https://arxiv.org/abs/1906.12188v1
PDF	https://arxiv.org/pdf/1906.12188v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-decoder-structure-based-on
Repo
Framework

Weakly Labeled Sound Event Detection Using Tri-training and Adversarial Learning


Title	Weakly Labeled Sound Event Detection Using Tri-training and Adversarial Learning
Authors	Hyoungwoo Park, Sungrack Yun, Jungyun Eum, Janghoon Cho, Kyuwoong Hwang
Abstract	This paper considers a semi-supervised learning framework for weakly labeled polyphonic sound event detection problems for the DCASE 2019 challenge’s task4 by combining both the tri-training and adversarial learning. The goal of the task4 is to detect onsets and offsets of multiple sound events in a single audio clip. The entire dataset consists of the synthetic data with a strong label (sound event labels with boundaries) and real data with weakly labeled (sound event labels) and unlabeled dataset. Given this dataset, we apply the tri-training where two different classifiers are used to obtain pseudo labels on the weakly labeled and unlabeled dataset, and the final classifier is trained using the strongly labeled dataset and weakly/unlabeled dataset with pseudo labels. Also, we apply the adversarial learning to reduce the domain gap between the real and synthetic dataset. We evaluated our learning framework using the validation set of the task4 dataset, and in the experiments, our learning framework shows a considerable performance improvement over the baseline model.
Tasks	Sound Event Detection
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06790v1
PDF	https://arxiv.org/pdf/1910.06790v1.pdf
PWC	https://paperswithcode.com/paper/weakly-labeled-sound-event-detection-using
Repo
Framework

A Dual-Hormone Closed-Loop Delivery System for Type 1 Diabetes Using Deep Reinforcement Learning


Title	A Dual-Hormone Closed-Loop Delivery System for Type 1 Diabetes Using Deep Reinforcement Learning
Authors	Taiyu Zhu, Kezhi Li, Pantelis Georgiou
Abstract	We propose a dual-hormone delivery strategy by exploiting deep reinforcement learning (RL) for people with Type 1 Diabetes (T1D). Specifically, double dilated recurrent neural networks (RNN) are used to learn the hormone delivery strategy, trained by a variant of Q-learning, whose inputs are raw data of glucose & meal carbohydrate and outputs are dual-hormone (insulin and glucagon) delivery. Without prior knowledge of the glucose-insulin metabolism, we run the method on the UVA/Padova simulator. Hundreds days of self-play are performed to obtain a generalized model, then importance sampling is adopted to customize the model for personal use. \emph{In-silico} the proposed strategy achieves glucose time in target range (TIR) $93%$ for adults and $83%$ for adolescents given standard bolus, outperforming previous approaches significantly. The results indicate that deep RL is effective in building personalized hormone delivery strategy for people with T1D.
Tasks	Q-Learning
Published	2019-10-09
URL	https://arxiv.org/abs/1910.04059v1
PDF	https://arxiv.org/pdf/1910.04059v1.pdf
PWC	https://paperswithcode.com/paper/a-dual-hormone-closed-loop-delivery-system
Repo
Framework

What graph neural networks cannot learn: depth vs width


Title	What graph neural networks cannot learn: depth vs width
Authors	Andreas Loukas
Abstract	This paper studies the expressive power of graph neural networks falling within the message-passing framework (GNNmp). Two results are presented. First, GNNmp are shown to be Turing universal under sufficient conditions on their depth, width, node attributes, and layer expressiveness. Second, it is discovered that GNNmp can lose a significant portion of their power when their depth and width is restricted. The proposed impossibility statements stem from a new technique that enables the repurposing of seminal results from distributed computing and leads to lower bounds for an array of decision, optimization, and estimation problems involving graphs. Strikingly, several of these problems are deemed impossible unless the product of a GNNmp’s depth and width exceeds a polynomial of the graph size; this dependence remains significant even for tasks that appear simple or when considering approximation.
Tasks
Published	2019-07-06
URL	https://arxiv.org/abs/1907.03199v2
PDF	https://arxiv.org/pdf/1907.03199v2.pdf
PWC	https://paperswithcode.com/paper/what-graph-neural-networks-cannot-learn-depth
Repo
Framework

Facial Image Deformation Based on Landmark Detection


Title	Facial Image Deformation Based on Landmark Detection
Authors	Chaoyue Song, Yugang Chen, Shulai Zhang, Bingbing Ni
Abstract	In this work, we use facial landmarks to make the deformation for facial images more authentic and verisimilar. The deformation includes the expansion for eyes and the shrinking for noses, mouths, and cheeks. An advanced 106-point facial landmark detector is utilized to provide control points for deformation. Bilinear interpolation is used in the expansion part and Moving Least Squares methods (MLS) including Affine Deformation, Similarity Deformation and Rigid Deformation are used in the shrinking part. We then compare the running time as well as the quality of deformed images using different MLS methods. The experimental results show that the Rigid Deformation which can keep other parts of the images unchanged performs best even if it takes the longest time.
Tasks
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13671v1
PDF	https://arxiv.org/pdf/1910.13671v1.pdf
PWC	https://paperswithcode.com/paper/facial-image-deformation-based-on-landmark
Repo
Framework

Attribute Acquisition in Ontology based on Representation Learning of Hierarchical Classes and Attributes


Title	Attribute Acquisition in Ontology based on Representation Learning of Hierarchical Classes and Attributes
Authors	Tianwen Jiang, Ming Liu, Bing Qin, Ting Liu
Abstract	Attribute acquisition for classes is a key step in ontology construction, which is often achieved by community members manually. This paper investigates an attention-based automatic paradigm called TransATT for attribute acquisition, by learning the representation of hierarchical classes and attributes in Chinese ontology. The attributes of an entity can be acquired by merely inspecting its classes, because the entity can be regard as the instance of its classes and inherit their attributes. For explicitly describing of the class of an entity unambiguously, we propose class-path to represent the hierarchical classes in ontology, instead of the terminal class word of the hypernym-hyponym relation (i.e., is-a relation) based hierarchy. The high performance of TransATT on attribute acquisition indicates the promising ability of the learned representation of class-paths and attributes. Moreover, we construct a dataset named \textbf{BigCilin11k}. To the best of our knowledge, this is the first Chinese dataset with abundant hierarchical classes and entities with attributes.
Tasks	Representation Learning
Published	2019-03-08
URL	http://arxiv.org/abs/1903.03282v1
PDF	http://arxiv.org/pdf/1903.03282v1.pdf
PWC	https://paperswithcode.com/paper/attribute-acquisition-in-ontology-based-on
Repo
Framework

Autonomous Reinforcement Learning of Multiple Interrelated Tasks


Title	Autonomous Reinforcement Learning of Multiple Interrelated Tasks
Authors	Vieri Giuliano Santucci, Gianluca Baldassarre, Emilio Cartoni
Abstract	Autonomous multiple tasks learning is a fundamental capability to develop versatile artificial agents that can act in complex environments. In real-world scenarios, tasks may be interrelated (or “hierarchical”) so that a robot has to first learn to achieve some of them to set the preconditions for learning other ones. Even though different strategies have been used in robotics to tackle the acquisition of interrelated tasks, in particular within the developmental robotics framework, autonomous learning in this kind of scenarios is still an open question. Building on previous research in the framework of intrinsically motivated open-ended learning, in this work we describe how this question can be addressed working on the level of task selection, in particular considering the multiple interrelated tasks scenario as an MDP where the system is trying to maximise its competence over all the tasks.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01374v1
PDF	https://arxiv.org/pdf/1906.01374v1.pdf
PWC	https://paperswithcode.com/paper/autonomous-reinforcement-learning-of-multiple
Repo
Framework

BehavDT: A Behavioral Decision Tree Learning to Build User-Centric Context-Aware Predictive Model


Title	BehavDT: A Behavioral Decision Tree Learning to Build User-Centric Context-Aware Predictive Model
Authors	Iqbal H. Sarker, Alan Colman, Jun Han, Asif Irshad Khan, Yoosef B. Abushark, Khaled Salah
Abstract	This paper formulates the problem of building a context-aware predictive model based on user diverse behavioral activities with smartphones. In the area of machine learning and data science, a tree-like model as that of decision tree is considered as one of the most popular classification techniques, which can be used to build a data-driven predictive model. The traditional decision tree model typically creates a number of leaf nodes as decision nodes that represent context-specific rigid decisions, and consequently may cause overfitting problem in behavior modeling. However, in many practical scenarios within the context-aware environment, the generalized outcomes could play an important role to effectively capture user behavior. In this paper, we propose a behavioral decision tree, “BehavDT” context-aware model that takes into account user behavior-oriented generalization according to individual preference level. The BehavDT model outputs not only the generalized decisions but also the context-specific decisions in relevant exceptional cases. The effectiveness of our BehavDT model is studied by conducting experiments on individual user real smartphone datasets. Our experimental results show that the proposed BehavDT context-aware model is more effective when compared with the traditional machine learning approaches, in predicting user diverse behaviors considering multi-dimensional contexts.
Tasks
Published	2019-12-17
URL	https://arxiv.org/abs/2001.00621v1
PDF	https://arxiv.org/pdf/2001.00621v1.pdf
PWC	https://paperswithcode.com/paper/behavdt-a-behavioral-decision-tree-learning
Repo
Framework