April 2, 2020

3587 words 17 mins read

Paper Group ANR 328

Paper Group ANR 328

Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning. Deep RL Agent for a Real-Time Action Strategy Game. Plague Dot Text: Text mining and annotation of outbreak reports of the Third Plague Pandemic (1894-1952). Acoustic Scene Classification Using Bilinear Pooling on Time-liked an …

Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning

Title Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning
Authors Gaole He, Junyi Li, Wayne Xin Zhao, Peiju Liu, Ji-Rong Wen
Abstract The task of Knowledge Graph Completion (KGC) aims to automatically infer the missing fact information in Knowledge Graph (KG). In this paper, we take a new perspective that aims to leverage rich user-item interaction data (user interaction data for short) for improving the KGC task. Our work is inspired by the observation that many KG entities correspond to online items in application systems. However, the two kinds of data sources have very different intrinsic characteristics, and it is likely to hurt the original performance using simple fusion strategy. To address this challenge, we propose a novel adversarial learning approach by leveraging user interaction data for the KGC task. Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator. The discriminator takes the learned useful information from user interaction data as input, and gradually enhances the evaluation capacity in order to identify the fake samples generated by the generator. To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks, which will be jointly optimized with the discriminator. Such an approach is effective to alleviate the issues about data heterogeneity and semantic complexity for the KGC task. Extensive experiments on three real-world datasets have demonstrated the effectiveness of our approach on the KGC task.
Tasks Knowledge Graph Completion
Published 2020-03-28
URL https://arxiv.org/abs/2003.12718v1
PDF https://arxiv.org/pdf/2003.12718v1.pdf
PWC https://paperswithcode.com/paper/mining-implicit-entity-preference-from-user

Deep RL Agent for a Real-Time Action Strategy Game

Title Deep RL Agent for a Real-Time Action Strategy Game
Authors Michal Warchalski, Dimitrije Radojevic, Milos Milosevic
Abstract We introduce a reinforcement learning environment based on Heroic - Magic Duel, a 1 v 1 action strategy game. This domain is non-trivial for several reasons: it is a real-time game, the state space is large, the information given to the player before and at each step of a match is imperfect, and distribution of actions is dynamic. Our main contribution is a deep reinforcement learning agent playing the game at a competitive level that we trained using PPO and self-play with multiple competing agents, employing only a simple reward of $\pm 1$ depending on the outcome of a single match. Our best self-play agent, obtains around $65%$ win rate against the existing AI and over $50%$ win rate against a top human player.
Published 2020-02-15
URL https://arxiv.org/abs/2002.06290v1
PDF https://arxiv.org/pdf/2002.06290v1.pdf
PWC https://paperswithcode.com/paper/deep-rl-agent-for-a-real-time-action-strategy

Plague Dot Text: Text mining and annotation of outbreak reports of the Third Plague Pandemic (1894-1952)

Title Plague Dot Text: Text mining and annotation of outbreak reports of the Third Plague Pandemic (1894-1952)
Authors Arlene Casey, Mike Bennett, Richard Tobin, Claire Grover, Iona Walker, Lukas Engelmann, Beatrice Alex
Abstract The design of models that govern diseases in population is commonly built on information and data gathered from past outbreaks. However, epidemic outbreaks are never captured in statistical data alone but are communicated by narratives, supported by empirical observations. Outbreak reports discuss correlations between populations, locations and the disease to infer insights into causes, vectors and potential interventions. The problem with these narratives is usually the lack of consistent structure or strong conventions, which prohibit their formal analysis in larger corpora. Our interdisciplinary research investigates more than 100 reports from the third plague pandemic (1894-1952) evaluating ways of building a corpus to extract and structure this narrative information through text mining and manual annotation. In this paper we discuss the progress of our ongoing exploratory project, how we enhance optical character recognition (OCR) methods to improve text capture, our approach to structure the narratives and identify relevant entities in the reports. The structured corpus is made available via Solr enabling search and analysis across the whole collection for future research dedicated, for example, to the identification of concepts. We show preliminary visualisations of the characteristics of causation and differences with respect to gender as a result of syntactic-category-dependent corpus statistics. Our goal is to develop structured accounts of some of the most significant concepts that were used to understand the epidemiology of the third plague pandemic around the globe. The corpus enables researchers to analyse the reports collectively allowing for deep insights into the global epidemiological consideration of plague in the early twentieth century.
Tasks Epidemiology, Optical Character Recognition
Published 2020-02-04
URL https://arxiv.org/abs/2002.01415v1
PDF https://arxiv.org/pdf/2002.01415v1.pdf
PWC https://paperswithcode.com/paper/plague-dot-text-text-mining-and-annotation-of

Acoustic Scene Classification Using Bilinear Pooling on Time-liked and Frequency-liked Convolution Neural Network

Title Acoustic Scene Classification Using Bilinear Pooling on Time-liked and Frequency-liked Convolution Neural Network
Authors Xing Yong Kek, Cheng Siong Chin, Ye Li
Abstract The current methodology in tackling Acoustic Scene Classification (ASC) task can be described in two steps, preprocessing of the audio waveform into log-mel spectrogram and then using it as the input representation for Convolutional Neural Network (CNN). This paradigm shift occurs after DCASE 2016 where this framework model achieves the state-of-the-art result in ASC tasks on the (ESC-50) dataset and achieved an accuracy of 64.5%, which constitute to 20.5% improvement over the baseline model, and DCASE 2016 dataset with an accuracy of 90.0% (development) and 86.2% (evaluation), which constitute a 6.4% and 9% improvements with respect to the baseline system. In this paper, we explored the use of harmonic and percussive source separation (HPSS) to split the audio into harmonic audio and percussive audio, which has received popularity in the field of music information retrieval (MIR). Although works have been done in using HPSS as input representation for CNN model in ASC task, this paper further investigate the possibility on leveraging the separated harmonic component and percussive component by curating 2 CNNs which tries to understand harmonic audio and percussive audio in their natural form, one specialized in extracting deep features in time biased domain and another specialized in extracting deep features in frequency biased domain, respectively. The deep features extracted from these 2 CNNs will then be combined using bilinear pooling. Hence, presenting a two-stream time and frequency CNN architecture approach in classifying acoustic scene. The model is being evaluated on DCASE 2019 sub task 1a dataset and scored an average of 65% on development dataset, Kaggle Leadership Private and Public board.
Tasks Acoustic Scene Classification, Information Retrieval, Music Information Retrieval, Scene Classification
Published 2020-02-14
URL https://arxiv.org/abs/2002.07065v1
PDF https://arxiv.org/pdf/2002.07065v1.pdf
PWC https://paperswithcode.com/paper/acoustic-scene-classification-using-bilinear

MARVEL: A Decoupled Model-driven Approach for Efficiently Mapping Convolutions on Spatial DNN Accelerators

Title MARVEL: A Decoupled Model-driven Approach for Efficiently Mapping Convolutions on Spatial DNN Accelerators
Authors Prasanth Chatarasi, Hyoukjun Kwon, Natesh Raina, Saurabh Malik, Vaisakh Haridas, Tushar Krishna, Vivek Sarkar
Abstract The efficiency of a spatial DNN accelerator depends heavily on the compiler’s ability to generate optimized mappings for a given DNN’s operators (layers) on to the accelerator’s compute and memory resources. Searching for the optimal mapping is challenging because of a massive space of possible data-layouts and loop transformations for the DNN layers. For example, there are over 10^19 valid mappings for a single convolution layer on average for mapping ResNet50 and MobileNetV2 on a representative DNN edge accelerator. This challenge gets exacerbated with new layer types (e.g., depth-wise and point-wise convolutions) and diverse hardware accelerator configurations. To address this challenge, we propose a decoupled off-chip/on-chip approach that decomposes the mapping space into off-chip and on-chip subspaces, and first optimizes the off-chip subspace followed by the on-chip subspace. The motivation for this decomposition is to dramatically reduce the size of the search space, and to also prioritize the optimization of off-chip data movement, which is 2-3 orders of magnitude more compared to the on-chip data movement. We introduce {\em Marvel}, which implements the above approach by leveraging two cost models to explore the two subspaces – a classical distinct-block (DB) locality cost model for the off-chip subspace, and a state-of-the-art DNN accelerator behavioral cost model, MAESTRO, for the on-chip subspace. Our approach also considers dimension permutation, a form of data-layouts, in the mapping space formulation along with the loop transformations.
Published 2020-02-18
URL https://arxiv.org/abs/2002.07752v1
PDF https://arxiv.org/pdf/2002.07752v1.pdf
PWC https://paperswithcode.com/paper/marvel-a-decoupled-model-driven-approach-for

Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis

Title Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis
Authors Alexander Schindler
Abstract This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. This thesis focuses on the information provided by the visual layer of music videos and how it can be harnessed to augment and improve tasks of the MIR research domain. The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone, without the sound being heard. This leads to the hypothesis that there exists a visual language that is used to express mood or genre. In a further consequence it can be concluded that this visual information is music related and thus should be beneficial for the corresponding MIR tasks such as music genre classification or mood recognition. A series of comprehensive experiments and evaluations are conducted which are focused on the extraction of visual information and its application in different MIR tasks. A custom dataset is created, suitable to develop and test visual features which are able to represent music related information. Evaluations range from low-level visual features to high-level concepts retrieved by means of Deep Convolutional Neural Networks. Additionally, new visual features are introduced capturing rhythmic visual patterns. In all of these experiments the audio-based results serve as benchmark for the visual and audio-visual approaches. The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification. Experiments show that an audio-visual approach harnessing high-level semantic information gained from visual concept detection, outperforms audio-only genre-classification accuracy by 16.43%.
Tasks Information Retrieval, Music Information Retrieval
Published 2020-02-01
URL https://arxiv.org/abs/2002.00251v1
PDF https://arxiv.org/pdf/2002.00251v1.pdf
PWC https://paperswithcode.com/paper/multi-modal-music-information-retrieval

CRNet: Cross-Reference Networks for Few-Shot Segmentation

Title CRNet: Cross-Reference Networks for Few-Shot Segmentation
Authors Weide Liu, Chi Zhang, Guosheng Lin, Fayao Liu
Abstract Over the past few years, state-of-the-art image segmentation algorithms are based on deep convolutional neural networks. To render a deep network with the ability to understand a concept, humans need to collect a large amount of pixel-level annotated data to train the models, which is time-consuming and tedious. Recently, few-shot segmentation is proposed to solve this problem. Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images. In this paper, we propose a cross-reference network (CRNet) for few-shot segmentation. Unlike previous works which only predict the mask in the query image, our proposed model concurrently make predictions for both the support image and the query image. With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images, thus helping the few-shot segmentation task. We also develop a mask refinement module to recurrently refine the prediction of the foreground regions. For the $k$-shot learning, we propose to finetune parts of networks to take advantage of multiple labeled support images. Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
Tasks Semantic Segmentation
Published 2020-03-24
URL https://arxiv.org/abs/2003.10658v1
PDF https://arxiv.org/pdf/2003.10658v1.pdf
PWC https://paperswithcode.com/paper/crnet-cross-reference-networks-for-few-shot

Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

Title Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Authors Vipul Gupta, Santiago Akle Serrano, Dennis DeCoste
Abstract We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm to accelerate DNN training. Our algorithm uses large mini-batches to compute an approximate solution quickly and then refines it by averaging the weights of multiple models computed independently and in parallel. The resulting models generalize equally well as those trained with small mini-batches but are produced in a substantially shorter time. We demonstrate the reduction in training time and the good generalization performance of the resulting models on the computer vision datasets CIFAR10, CIFAR100, and ImageNet.
Published 2020-01-07
URL https://arxiv.org/abs/2001.02312v1
PDF https://arxiv.org/pdf/2001.02312v1.pdf
PWC https://paperswithcode.com/paper/stochastic-weight-averaging-in-parallel-large-1

VC-dimensions of nondeterministic finite automata for words of equal length

Title VC-dimensions of nondeterministic finite automata for words of equal length
Authors Bjørn Kjos-Hanssen, Clyde James Felix, Sun Young Kim, Ethan Lamb, Davin Takahashi
Abstract Ishigami and Tani studied VC-dimensions of deterministic finite automata. We obtain analogous results for the nondeterministic case by extending a result of Champarnaud and Pin, who proved that the maximal deterministic state complexity of a set of binary words of length $n$ is [ \sum_{i=0}^n\min(2^i,2^{2^{n-i}}-1). ] We show that for the nondeterministic case, if we fully restrict attention to words of length $n$, then we at most need the strictly increasing initial terms in this sum.
Published 2020-01-07
URL https://arxiv.org/abs/2001.02309v1
PDF https://arxiv.org/pdf/2001.02309v1.pdf
PWC https://paperswithcode.com/paper/vc-dimensions-of-nondeterministic-finite

Hierarchical Classification of Enzyme Promiscuity Using Positive, Unlabeled, and Hard Negative Examples

Title Hierarchical Classification of Enzyme Promiscuity Using Positive, Unlabeled, and Hard Negative Examples
Authors Gian Marco Visani, Michael C. Hughes, Soha Hassoun
Abstract Despite significant progress in sequencing technology, there are many cellular enzymatic activities that remain unknown. We develop a new method, referred to as SUNDRY (Similarity-weighting for UNlabeled Data in a Residual HierarchY), for training enzyme-specific predictors that take as input a query substrate molecule and return whether the enzyme would act on that substrate or not. When addressing this enzyme promiscuity prediction problem, a major challenge is the lack of abundant labeled data, especially the shortage of labeled data for negative cases (enzyme-substrate pairs where the enzyme does not act to transform the substrate to a product molecule). To overcome this issue, our proposed method can learn to classify a target enzyme by sharing information from related enzymes via known tree hierarchies. Our method can also incorporate three types of data: those molecules known to be catalyzed by an enzyme (positive cases), those with unknown relationships (unlabeled cases), and molecules labeled as inhibitors for the enzyme. We refer to inhibitors as hard negative cases because they may be difficult to classify well: they bind to the enzyme, like positive cases, but are not transformed by the enzyme. Our method uses confidence scores derived from structural similarity to treat unlabeled examples as weighted negatives. We compare our proposed hierarchy-aware predictor against a baseline that cannot share information across related enzymes. Using data from the BRENDA database, we show that each of our contributions (hierarchical sharing, per-example confidence weighting of unlabeled data based on molecular similarity, and including inhibitors as hard-negative examples) contributes towards a better characterization of enzyme promiscuity.
Published 2020-02-18
URL https://arxiv.org/abs/2002.07327v1
PDF https://arxiv.org/pdf/2002.07327v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-classification-of-enzyme

Facial Emotions Recognition using Convolutional Neural Net

Title Facial Emotions Recognition using Convolutional Neural Net
Authors Faisal Ghaffar
Abstract Human beings displays their emotions using facial expressions. For human it is very easy to recognize those emotions but for computer it is very challenging. Facial expressions vary from person to person. Brightness, contrast and resolution of every random image is different. This is why recognizing facial expression is very difficult. The facial expression recognition is an active research area. In this project, we worked on recognition of seven basic human emotions. These emotions are angry, disgust, fear, happy, sad, surprise and neutral. Every image was first passed through face detection algorithm to include it in train dataset. As CNN requires large amount of data so we duplicated our data using various filter on each image. The system is trained using CNN architecture. Preprocessed images of size 80*100 is passed as input to the first layer of CNN. Three convolutional layers were used, each of which was followed by a pooling layer and then three dense layers. The dropout rate for dense layer was 20%. The model was trained by combination of two publicly available datasets JAFFED and KDEF. 90% of the data was used for training while 10% was used for testing. We achieved maximum accuracy of 78% using combined dataset.
Tasks Face Detection, Facial Expression Recognition
Published 2020-01-06
URL https://arxiv.org/abs/2001.01456v1
PDF https://arxiv.org/pdf/2001.01456v1.pdf
PWC https://paperswithcode.com/paper/facial-emotions-recognition-using

Generalized Gumbel-Softmax Gradient Estimator for Various Discrete Random Variables

Title Generalized Gumbel-Softmax Gradient Estimator for Various Discrete Random Variables
Authors Weonyoung Joo, Dongjun Kim, Seungjae Shin, Il-Chul Moon
Abstract Estimating the gradients of stochastic nodes is one of the crucial research questions in the deep generative modeling community. This estimation problem becomes further complex when we regard the stochastic nodes to be discrete because pathwise derivative techniques can not be applied. Hence, the gradient estimation requires the score function methods or the continuous relaxation of the discrete random variables. This paper proposes a general version of the Gumbel-Softmax estimator with continuous relaxation, and this estimator is able to relax the discreteness of probability distributions, including broader types than the current practice. In detail, we utilize the truncation of discrete random variables and the Gumbel-Softmax trick with a linear transformation for the relaxation. The proposed approach enables the relaxed discrete random variable to be reparameterized and to backpropagate through a large scale stochastic neural network. Our experiments consist of synthetic data analyses, which show the efficacy of our methods, and topic model analyses, which demonstrates the value of the proposed estimation in practices.
Published 2020-03-04
URL https://arxiv.org/abs/2003.01847v1
PDF https://arxiv.org/pdf/2003.01847v1.pdf
PWC https://paperswithcode.com/paper/generalized-gumbel-softmax-gradient-estimator

Deflecting Adversarial Attacks

Title Deflecting Adversarial Attacks
Authors Yao Qin, Nicholas Frosst, Colin Raffel, Garrison Cottrell, Geoffrey Hinton
Abstract There has been an ongoing cycle where stronger defenses against adversarial attacks are subsequently broken by a more advanced defense-aware attack. We present a new approach towards ending this cycle where we “deflect’’ adversarial attacks by causing the attacker to produce an input that semantically resembles the attack’s target class. To this end, we first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance on both standard and defense-aware attacks. We then show that undetected attacks against our defense often perceptually resemble the adversarial target class by performing a human study where participants are asked to label images produced by the attack. These attack images can no longer be called “adversarial’’ because our network classifies them the same way as humans do.
Published 2020-02-18
URL https://arxiv.org/abs/2002.07405v1
PDF https://arxiv.org/pdf/2002.07405v1.pdf
PWC https://paperswithcode.com/paper/deflecting-adversarial-attacks

ECSP: A New Task for Emotion-Cause Span-Pair Extraction and Classification

Title ECSP: A New Task for Emotion-Cause Span-Pair Extraction and Classification
Authors Hongliang Bi, Pengyuan Liu
Abstract Emotion cause analysis such as emotion cause extraction (ECE) and emotion-cause pair extraction (ECPE) have gradually attracted the attention of many researchers. However, there are still two shortcomings in the existing research: 1) In most cases, emotion expression and cause are not the whole clause, but the span in the clause, so extracting the clause-pair rather than the span-pair greatly limits its applications in real-world scenarios; 2) It is not enough to extract the emotion expression clause without identifying the emotion categories, the presence of emotion clause does not necessarily convey emotional information explicitly due to different possible causes. In this paper, we propose a new task: Emotion-Cause Span-Pair extraction and classification (ECSP), which aims to extract the potential span-pair of emotion and corresponding causes in a document, and make emotion classification for each pair. In the new ECSP task, ECE and ECPE can be regarded as two special cases at the clause-level. We propose a span-based extract-then-classify (ETC) model, where emotion and cause are directly extracted and paired from the document under the supervision of target span boundaries, and corresponding categories are then classified using their pair representations and localized context. Experiments show that our proposed ETC model outperforms the SOTA model of ECE and ECPE task respectively and gets a fair-enough results on ECSP task.
Tasks Emotion Classification
Published 2020-03-07
URL https://arxiv.org/abs/2003.03507v1
PDF https://arxiv.org/pdf/2003.03507v1.pdf
PWC https://paperswithcode.com/paper/ecsp-a-new-task-for-emotion-cause-span-pair

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

Title x-vectors meet emotions: A study on dependencies between emotion and speaker recognition
Authors Raghavendra Pappagari, Tianzi Wang, Jesus Villalba, Nanxin Chen, Najim Dehak
Abstract In this work, we explore the dependencies between speaker recognition and emotion recognition. We first show that knowledge learned for speaker recognition can be reused for emotion recognition through transfer learning. Then, we show the effect of emotion on speaker recognition. For emotion recognition, we show that using a simple linear model is enough to obtain good performance on the features extracted from pre-trained models such as the x-vector model. Then, we improve emotion recognition performance by fine-tuning for emotion classification. We evaluated our experiments on three different types of datasets: IEMOCAP, MSP-Podcast, and Crema-D. By fine-tuning, we obtained 30.40%, 7.99%, and 8.61% absolute improvement on IEMOCAP, MSP-Podcast, and Crema-D respectively over baseline model with no pre-training. Finally, we present results on the effect of emotion on speaker verification. We observed that speaker verification performance is prone to changes in test speaker emotions. We found that trials with angry utterances performed worst in all three datasets. We hope our analysis will initiate a new line of research in the speaker recognition community.
Tasks Emotion Classification, Emotion Recognition, Speaker Recognition, Speaker Verification, Transfer Learning
Published 2020-02-12
URL https://arxiv.org/abs/2002.05039v1
PDF https://arxiv.org/pdf/2002.05039v1.pdf
PWC https://paperswithcode.com/paper/x-vectors-meet-emotions-a-study-on
comments powered by Disqus