April 2, 2020

3265 words 16 mins read

Paper Group ANR 309

Paper Group ANR 309

Marketplace for AI Models. UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation. Robust Semantic Segmentation of Brain Tumor Regions from 3D MRIs. Wavesplit: End-to-End Speech Separation by Speaker Clustering. Generalized Embedding Machines for Recommender Systems. Stochastic encoding of graphs in dee …

Marketplace for AI Models

Title Marketplace for AI Models
Authors Abhishek Kumar, Benjamin Finley, Tristan Braud, Sasu Tarkoma, Pan Hui
Abstract Artificial intelligence shows promise for solving many practical societal problems in areas such as healthcare and transportation. However, the current mechanisms for AI model diffusion such as Github code repositories, academic project webpages, and commercial AI marketplaces have some limitations; for example, a lack of monetization methods, model traceability, and model auditabilty. In this work, we sketch guidelines for a new AI diffusion method based on a decentralized online marketplace. We consider the technical, economic, and regulatory aspects of such a marketplace including a discussion of solutions for problems in these areas. Finally, we include a comparative analysis of several current AI marketplaces that are already available or in development. We find that most of these marketplaces are centralized commercial marketplaces with relatively few models.
Tasks
Published 2020-03-03
URL https://arxiv.org/abs/2003.01593v1
PDF https://arxiv.org/pdf/2003.01593v1.pdf
PWC https://paperswithcode.com/paper/marketplace-for-ai-models
Repo
Framework

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Title UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Authors Huaishao Luo, Lei Ji, Botian Shi, Haoyang Huang, Nan Duan, Tianrui Li, Xilin Chen, Ming Zhou
Abstract We propose UniViLM: a Unified Video and Language pre-training Model for multimodal understanding and generation. Motivated by the recent success of BERT based pre-training technique for NLP and image-language tasks, VideoBERT and CBT are proposed to exploit BERT model for video and language pre-training using narrated instructional videos. Different from their works which only pre-train understanding task, we propose a unified video-language pre-training model for both understanding and generation tasks. Our model comprises of 4 components including two single-modal encoders, a cross encoder and a decoder with the Transformer backbone. We first pre-train our model to learn the universal representation for both video and language on a large instructional video dataset. Then we fine-tune the model on two multimodal tasks including understanding task (text-based video retrieval) and generation task (multimodal video captioning). Our extensive experiments show that our method can improve the performance of both understanding and generation tasks and achieves the state-of-the art results.
Tasks Video Captioning, Video Retrieval
Published 2020-02-15
URL https://arxiv.org/abs/2002.06353v1
PDF https://arxiv.org/pdf/2002.06353v1.pdf
PWC https://paperswithcode.com/paper/univilm-a-unified-video-and-language-pre
Repo
Framework

Robust Semantic Segmentation of Brain Tumor Regions from 3D MRIs

Title Robust Semantic Segmentation of Brain Tumor Regions from 3D MRIs
Authors Andriy Myronenko, Ali Hatamizadeh
Abstract Multimodal brain tumor segmentation challenge (BraTS) brings together researchers to improve automated methods for 3D MRI brain tumor segmentation. Tumor segmentation is one of the fundamental vision tasks necessary for diagnosis and treatment planning of the disease. Previous years winning methods were all deep-learning based, thanks to the advent of modern GPUs, which allow fast optimization of deep convolutional neural network architectures. In this work, we explore best practices of 3D semantic segmentation, including conventional encoder-decoder architecture, as well combined loss functions, in attempt to further improve the segmentation accuracy. We evaluate the method on BraTS 2019 challenge.
Tasks 3D Semantic Segmentation, Brain Tumor Segmentation, Semantic Segmentation
Published 2020-01-06
URL https://arxiv.org/abs/2001.02040v1
PDF https://arxiv.org/pdf/2001.02040v1.pdf
PWC https://paperswithcode.com/paper/robust-semantic-segmentation-of-brain-tumor
Repo
Framework

Wavesplit: End-to-End Speech Separation by Speaker Clustering

Title Wavesplit: End-to-End Speech Separation by Speaker Clustering
Authors Neil Zeghidour, David Grangier
Abstract We introduce Wavesplit, an end-to-end speech separation system. From a single recording of mixed speech, the model infers and clusters representations of each speaker and then estimates each source signal conditioned on the inferred representations. The model is trained on the raw waveform to jointly perform the two tasks. Our model infers a set of speaker representations through clustering, which addresses the fundamental permutation problem of speech separation. Moreover, the sequence-wide speaker representations provide a more robust separation of long, challenging sequences, compared to previous approaches. We show that Wavesplit outperforms the previous state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2mix, WSJ0-3mix), as well as in noisy (WHAM!) and reverberated (WHAMR!) conditions. As an additional contribution, we further improve our model by introducing online data augmentation for separation.
Tasks Data Augmentation, Speech Separation
Published 2020-02-20
URL https://arxiv.org/abs/2002.08933v1
PDF https://arxiv.org/pdf/2002.08933v1.pdf
PWC https://paperswithcode.com/paper/wavesplit-end-to-end-speech-separation-by
Repo
Framework

Generalized Embedding Machines for Recommender Systems

Title Generalized Embedding Machines for Recommender Systems
Authors Enneng Yang, Xin Xin, Li Shen, Guibing Guo
Abstract Factorization machine (FM) is an effective model for feature-based recommendation which utilizes inner product to capture second-order feature interactions. However, one of the major drawbacks of FM is that it couldn’t capture complex high-order interaction signals. A common solution is to change the interaction function, such as stacking deep neural networks on the top of FM. In this work, we propose an alternative approach to model high-order interaction signals in the embedding level, namely Generalized Embedding Machine (GEM). The embedding used in GEM encodes not only the information from the feature itself but also the information from other correlated features. Under such situation, the embedding becomes high-order. Then we can incorporate GEM with FM and even its advanced variants to perform feature interactions. More specifically, in this paper we utilize graph convolution networks (GCN) to generate high-order embeddings. We integrate GEM with several FM-based models and conduct extensive experiments on two real-world datasets. The results demonstrate significant improvement of GEM over corresponding baselines.
Tasks Recommendation Systems
Published 2020-02-16
URL https://arxiv.org/abs/2002.06561v1
PDF https://arxiv.org/pdf/2002.06561v1.pdf
PWC https://paperswithcode.com/paper/generalized-embedding-machines-for
Repo
Framework

Stochastic encoding of graphs in deep learning allows for complex analysis of gender classification in resting-state and task functional brain networks from the UK Biobank

Title Stochastic encoding of graphs in deep learning allows for complex analysis of gender classification in resting-state and task functional brain networks from the UK Biobank
Authors Matthew Leming, John Suckling
Abstract Classification of whole-brain functional connectivity MRI data with convolutional neural networks (CNNs) has shown promise, but the complexity of these models impedes understanding of which aspects of brain activity contribute to classification. While visualization techniques have been developed to interpret CNNs, bias inherent in the method of encoding abstract input data, as well as the natural variance of deep learning models, detract from the accuracy of these techniques. We introduce a stochastic encoding method in an ensemble of CNNs to classify functional connectomes by gender. We applied our method to resting-state and task data from the UK BioBank, using two visualization techniques to measure the salience of three brain networks involved in task- and resting-states, and their interaction. To regress confounding factors such as head motion, age, and intracranial volume, we introduced a multivariate balancing algorithm to ensure equal distributions of such covariates between classes in our data. We achieved a final AUROC of 0.8459. We found that resting-state data classifies more accurately than task data, with the inner salience network playing the most important role of the three networks overall in classification of resting-state data and connections to the central executive network in task data.
Tasks
Published 2020-02-25
URL https://arxiv.org/abs/2002.10936v1
PDF https://arxiv.org/pdf/2002.10936v1.pdf
PWC https://paperswithcode.com/paper/stochastic-encoding-of-graphs-in-deep
Repo
Framework

Secure and Robust Machine Learning for Healthcare: A Survey

Title Secure and Robust Machine Learning for Healthcare: A Survey
Authors Adnan Qayyum, Junaid Qadir, Muhammad Bilal, Ala Al-Fuqaha
Abstract Recent years have witnessed widespread adoption of machine learning (ML)/deep learning (DL) techniques due to their superior performance for a variety of healthcare applications ranging from the prediction of cardiac arrest from one-dimensional heart signals to computer-aided diagnosis (CADx) using multi-dimensional medical images. Notwithstanding the impressive performance of ML/DL, there are still lingering doubts regarding the robustness of ML/DL in healthcare settings (which is traditionally considered quite challenging due to the myriad security and privacy issues involved), especially in light of recent results that have shown that ML/DL are vulnerable to adversarial attacks. In this paper, we present an overview of various application areas in healthcare that leverage such techniques from security and privacy point of view and present associated challenges. In addition, we present potential methods to ensure secure and privacy-preserving ML for healthcare applications. Finally, we provide insight into the current research challenges and promising directions for future research.
Tasks
Published 2020-01-21
URL https://arxiv.org/abs/2001.08103v1
PDF https://arxiv.org/pdf/2001.08103v1.pdf
PWC https://paperswithcode.com/paper/secure-and-robust-machine-learning-for
Repo
Framework

RIS Enhanced Massive Non-orthogonal Multiple Access Networks: Deployment and Passive Beamforming Design

Title RIS Enhanced Massive Non-orthogonal Multiple Access Networks: Deployment and Passive Beamforming Design
Authors Xiao Liu, Yuanwei Liu, Yue Chen, H. Vincent Poor
Abstract A novel framework is proposed for the deployment and passive beamforming design of a reconfigurable intelligent surface (RIS) with the aid of non-orthogonal multiple access (NOMA) technology. The problem of joint deployment, phase shift design, as well as power allocation is formulated for maximizing the energy efficiency with considering users’ particular data requirements. To tackle this pertinent problem, machine learning approaches are adopted in two steps. Firstly, a novel long short-term memory (LSTM) based echo state network (ESN) algorithm is proposed to predict users’ tele-traffic demand by leveraging a real dataset. Secondly, a decaying double deep Q-network (D3QN) based position-acquisition and phase-control algorithm is proposed to solve the joint problem of deployment and design of the RIS. In the proposed algorithm, the base station, which controls the RIS by a controller, acts as an agent. The agent periodically observes the state of the RIS-enhanced system for attaining the optimal deployment and design policies of the RIS by learning from its mistakes and the feedback of users. Additionally, it is proved that the proposed D3QN based deployment and design algorithm is capable of converging within mild conditions. Simulation results are provided for illustrating that the proposed LSTM-based ESN algorithm is capable of striking a tradeoff between the prediction accuracy and computational complexity. Finally, it is demonstrated that the proposed D3QN based algorithm outperforms the benchmarks, while the NOMA-enhanced RIS system is capable of achieving higher energy efficiency than orthogonal multiple access (OMA) enabled RIS system.
Tasks
Published 2020-01-28
URL https://arxiv.org/abs/2001.10363v1
PDF https://arxiv.org/pdf/2001.10363v1.pdf
PWC https://paperswithcode.com/paper/ris-enhanced-massive-non-orthogonal-multiple
Repo
Framework

A Video Analysis Method on Wanfang Dataset via Deep Neural Network

Title A Video Analysis Method on Wanfang Dataset via Deep Neural Network
Authors Jinlong Kang, Jiaxiang Zheng, Heng Bai, Xiaoting Xue, Yang Zhou, Jun Guo
Abstract The topic of object detection has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as small object, compact and dense or highly overlapping object. Existing methods can detect multiple objects wonderfully, but because of the slight changes between frames, the detection effect of the model will become unstable, the detection results may result in dropping or increasing the object. In the pedestrian flow detection task, such phenomenon can not accurately calculate the flow. To solve this problem, in this paper, we describe the new function for real-time multi-object detection in sports competition and pedestrians flow detection in public based on deep learning. Our work is to extract a video clip and solve this frame of clips efficiently. More specfically, our algorithm includes two stages: judge method and optimization method. The judge can set a maximum threshold for better results under the model, the threshold value corresponds to the upper limit of the algorithm with better detection results. The optimization method to solve detection jitter problem. Because of the occurrence of frame hopping in the video, and it will result in the generation of video fragments discontinuity. We use optimization algorithm to get the key value, and then the detection result value of index is replaced by key value to stabilize the change of detection result sequence. Based on the proposed algorithm, we adopt wanfang sports competition dataset as the main test dataset and our own test dataset for YOLOv3-Abnormal Number Version(YOLOv3-ANV), which is 5.4% average improvement compared with existing methods. Also, video above the threshold value can be obtained for further analysis. Spontaneously, our work also can used for pedestrians flow detection and pedestrian alarm tasks.
Tasks Object Detection
Published 2020-02-28
URL https://arxiv.org/abs/2002.12535v1
PDF https://arxiv.org/pdf/2002.12535v1.pdf
PWC https://paperswithcode.com/paper/a-video-analysis-method-on-wanfang-dataset
Repo
Framework

Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields

Title Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields
Authors Michael Ramamonjisoa, Yuming Du, Vincent Lepetit
Abstract Current methods for depth map prediction from monocular images tend to predict smooth, poorly localized contours for the occlusion boundaries in the input image. This is unfortunate as occlusion boundaries are important cues to recognize objects, and as we show, may lead to a way to discover new objects from scene reconstruction. To improve predicted depth maps, recent methods rely on various forms of filtering or predict an additive residual depth map to refine a first estimate. We instead learn to predict, given a depth map predicted by some reconstruction method, a 2D displacement field able to re-sample pixels around the occlusion boundaries into sharper reconstructions. Our method can be applied to the output of any depth estimation method, in an end-to-end trainable fashion. For evaluation, we manually annotated the occlusion boundaries in all the images in the test split of popular NYUv2-Depth dataset. We show that our approach improves the localization of occlusion boundaries for all state-of-the-art monocular depth estimation methods that we could evaluate, without degrading the depth accuracy for the rest of the images.
Tasks Depth Estimation, Monocular Depth Estimation
Published 2020-02-28
URL https://arxiv.org/abs/2002.12730v2
PDF https://arxiv.org/pdf/2002.12730v2.pdf
PWC https://paperswithcode.com/paper/predicting-sharp-and-accurate-occlusion
Repo
Framework

Plannable Approximations to MDP Homomorphisms: Equivariance under Actions

Title Plannable Approximations to MDP Homomorphisms: Equivariance under Actions
Authors Elise van der Pol, Thomas Kipf, Frans A. Oliehoek, Max Welling
Abstract This work exploits action equivariance for representation learning in reinforcement learning. Equivariance under actions states that transitions in the input space are mirrored by equivalent transitions in latent space, while the map and transition functions should also commute. We introduce a contrastive loss function that enforces action equivariance on the learned representations. We prove that when our loss is zero, we have a homomorphism of a deterministic Markov Decision Process (MDP). Learning equivariant maps leads to structured latent spaces, allowing us to build a model on which we plan through value iteration. We show experimentally that for deterministic MDPs, the optimal policy in the abstract MDP can be successfully lifted to the original MDP. Moreover, the approach easily adapts to changes in the goal states. Empirically, we show that in such MDPs, we obtain better representations in fewer epochs compared to representation learning approaches using reconstructions, while generalizing better to new goals than model-free approaches.
Tasks Representation Learning
Published 2020-02-27
URL https://arxiv.org/abs/2002.11963v1
PDF https://arxiv.org/pdf/2002.11963v1.pdf
PWC https://paperswithcode.com/paper/plannable-approximations-to-mdp-homomorphisms
Repo
Framework

Residual Continual Learning

Title Residual Continual Learning
Authors Janghyeon Lee, Donggyu Joo, Hyeong Gwon Hong, Junmo Kim
Abstract We propose a novel continual learning method called Residual Continual Learning (ResCL). Our method can prevent the catastrophic forgetting phenomenon in sequential learning of multiple tasks, without any source task information except the original network. ResCL reparameterizes network parameters by linearly combining each layer of the original network and a fine-tuned network; therefore, the size of the network does not increase at all. To apply the proposed method to general convolutional neural networks, the effects of batch normalization layers are also considered. By utilizing residual-learning-like reparameterization and a special weight decay loss, the trade-off between source and target performance is effectively controlled. The proposed method exhibits state-of-the-art performance in various continual learning scenarios.
Tasks Continual Learning
Published 2020-02-17
URL https://arxiv.org/abs/2002.06774v1
PDF https://arxiv.org/pdf/2002.06774v1.pdf
PWC https://paperswithcode.com/paper/residual-continual-learning
Repo
Framework

HGAT: Hierarchical Graph Attention Network for Fake News Detection

Title HGAT: Hierarchical Graph Attention Network for Fake News Detection
Authors Yuxiang Ren, Jiawei Zhang
Abstract The explosive growth of fake news has eroded the credibility of medias and governments. Fake news detection has become an urgent task. News articles along with other related components like news creators and news subjects can be modeled as a heterogeneous information network (HIN for short). In this paper, we focus on studying the HIN- based fake news detection problem. We propose a novel fake news detection framework, namely Hierarchical Graph Attention Network (HGAT) which employs a novel hierarchical attention mechanism to detect fake news by classifying news article nodes in the HIN. This method can effectively learn information from different types of related nodes through node-level and schema-level attention. Experiments with real-world fake news data show that our model can outperform text-based models and other network-based models. Besides, the experiments also demonstrate the expandability and potential of HGAT for heterogeneous graphs representation learning in the future.
Tasks Fake News Detection, Representation Learning
Published 2020-02-05
URL https://arxiv.org/abs/2002.04397v1
PDF https://arxiv.org/pdf/2002.04397v1.pdf
PWC https://paperswithcode.com/paper/hgat-hierarchical-graph-attention-network-for
Repo
Framework

Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization

Title Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization
Authors Yi Zhou, Zhe Wang, Kaiyi Ji, Yingbin Liang, Vahid Tarokh
Abstract Various types of parameter restart schemes have been proposed for accelerated gradient algorithms to facilitate their practical convergence in convex optimization. However, the convergence properties of accelerated gradient algorithms under parameter restart remain obscure in nonconvex optimization. In this paper, we propose a novel accelerated proximal gradient algorithm with parameter restart (named APG-restart) for solving nonconvex and nonsmooth problems. Our APG-restart is designed to 1) allow for adopting flexible parameter restart schemes that cover many existing ones; 2) have a global sub-linear convergence rate in nonconvex and nonsmooth optimization; and 3) have guaranteed convergence to a critical point and have various types of asymptotic convergence rates depending on the parameterization of local geometry in nonconvex and nonsmooth optimization. Numerical experiments demonstrate the effectiveness of our proposed algorithm.
Tasks
Published 2020-02-26
URL https://arxiv.org/abs/2002.11582v1
PDF https://arxiv.org/pdf/2002.11582v1.pdf
PWC https://paperswithcode.com/paper/proximal-gradient-algorithm-with-momentum-and
Repo
Framework

To Transfer or Not to Transfer: Misclassification Attacks Against Transfer Learned Text Classifiers

Title To Transfer or Not to Transfer: Misclassification Attacks Against Transfer Learned Text Classifiers
Authors Bijeeta Pal, Shruti Tople
Abstract Transfer learning — transferring learned knowledge — has brought a paradigm shift in the way models are trained. The lucrative benefits of improved accuracy and reduced training time have shown promise in training models with constrained computational resources and fewer training samples. Specifically, publicly available text-based models such as GloVe and BERT that are trained on large corpus of datasets have seen ubiquitous adoption in practice. In this paper, we ask, “can transfer learning in text prediction models be exploited to perform misclassification attacks?” As our main contribution, we present novel attack techniques that utilize unintended features learnt in the teacher (public) model to generate adversarial examples for student (downstream) models. To the best of our knowledge, ours is the first work to show that transfer learning from state-of-the-art word-based and sentence-based teacher models increase the susceptibility of student models to misclassification attacks. First, we propose a novel word-score based attack algorithm for generating adversarial examples against student models trained using context-free word-level embedding model. On binary classification tasks trained using the GloVe teacher model, we achieve an average attack accuracy of 97% for the IMDB Movie Reviews and 80% for the Fake News Detection. For multi-class tasks, we divide the Newsgroup dataset into 6 and 20 classes and achieve an average attack accuracy of 75% and 41% respectively. Next, we present length-based and sentence-based misclassification attacks for the Fake News Detection task trained using a context-aware BERT model and achieve 78% and 39% attack accuracy respectively. Thus, our results motivate the need for designing training techniques that are robust to unintended feature learning, specifically for transfer learned models.
Tasks Fake News Detection, Transfer Learning
Published 2020-01-08
URL https://arxiv.org/abs/2001.02438v1
PDF https://arxiv.org/pdf/2001.02438v1.pdf
PWC https://paperswithcode.com/paper/to-transfer-or-not-to-transfer
Repo
Framework
comments powered by Disqus