February 1, 2020

2884 words 14 mins read

Paper Group AWR 109

Paper Group AWR 109

A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation. Revisiting Graph Neural Networks: All We Have is Low-Pass Filters. Human activity recognition from skeleton poses. Multivariate Time Series Classification using Dilated Convolutional Neural Network. Semantic Image Synthesis with Spatially-Adaptive Normalization. Causa …

A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation

Title A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation
Authors Varun Khare, Divyat Mahajan, Homanga Bharadhwaj, Vinay Verma, Piyush Rai
Abstract We present a domain adaptation based generative framework for zero-shot learning. Our framework addresses the problem of domain shift between the seen and unseen class distributions in zero-shot learning and minimizes the shift by developing a generative model trained via adversarial domain adaptation. Our approach is based on end-to-end learning of the class distributions of seen classes and unseen classes. To enable the model to learn the class distributions of unseen classes, we parameterize these class distributions in terms of the class attribute information (which is available for both seen and unseen classes). This provides a very simple way to learn the class distribution of any unseen class, given only its class attribute information, and no labeled training data. Training this model with adversarial domain adaptation further provides robustness against the distribution mismatch between the data from seen and unseen classes. Our approach also provides a novel way for training neural net based classifiers to overcome the hubness problem in zero-shot learning. Through a comprehensive set of experiments, we show that our model yields superior accuracies as compared to various state-of-the-art zero shot learning models, on a variety of benchmark datasets. Code for the experiments is available at github.com/vkkhare/ZSL-ADA
Tasks Domain Adaptation, Zero-Shot Learning
Published 2019-06-07
URL https://arxiv.org/abs/1906.03038v3
PDF https://arxiv.org/pdf/1906.03038v3.pdf
PWC https://paperswithcode.com/paper/a-generative-framework-for-zero-shot-learning
Repo https://github.com/vkkhare/ZSL-ADA
Framework none

Revisiting Graph Neural Networks: All We Have is Low-Pass Filters

Title Revisiting Graph Neural Networks: All We Have is Low-Pass Filters
Authors Hoang NT, Takanori Maehara
Abstract Graph neural networks have become one of the most important techniques to solve machine learning problems on graph-structured data. Recent work on vertex classification proposed deep and distributed learning models to achieve high performance and scalability. However, we find that the feature vectors of benchmark datasets are already quite informative for the classification task, and the graph structure only provides a means to denoise the data. In this paper, we develop a theoretical framework based on graph signal processing for analyzing graph neural networks. Our results indicate that graph neural networks only perform low-pass filtering on feature vectors and do not have the non-linear manifold learning property. We further investigate their resilience to feature noise and propose some insights on GCN-based graph neural network design.
Tasks
Published 2019-05-23
URL https://arxiv.org/abs/1905.09550v2
PDF https://arxiv.org/pdf/1905.09550v2.pdf
PWC https://paperswithcode.com/paper/revisiting-graph-neural-networks-all-we-have
Repo https://github.com/gear/gfnn
Framework pytorch

Human activity recognition from skeleton poses

Title Human activity recognition from skeleton poses
Authors Frederico Belmonte Klein, Angelo Cangelosi
Abstract Human Action Recognition is an important task of Human Robot Interaction as cooperation between robots and humans requires that artificial agents recognise complex cues from the environment. A promising approach is using trained classifiers to recognise human actions through sequences of skeleton poses extracted from images or RGB-D data from a sensor. However, with many different data-sets focused on slightly different sets of actions and different algorithms it is not clear which strategy produces highest accuracy for indoor activities performed in a home environment. This work discussed, tested and compared classic algorithms, namely, support vector machines and k-nearest neighbours, to 2 similar hierarchical neural gas approaches, the growing when required neural gas and the growing neural gas.
Tasks Activity Recognition, Human Activity Recognition, Temporal Action Localization
Published 2019-08-20
URL https://arxiv.org/abs/1908.08928v1
PDF https://arxiv.org/pdf/1908.08928v1.pdf
PWC https://paperswithcode.com/paper/human-activity-recognition-from-skeleton
Repo https://github.com/frederico-klein/cad-gas
Framework none

Multivariate Time Series Classification using Dilated Convolutional Neural Network

Title Multivariate Time Series Classification using Dilated Convolutional Neural Network
Authors Omolbanin Yazdanbakhsh, Scott Dick
Abstract Multivariate time series classification is a high value and well-known problem in machine learning community. Feature extraction is a main step in classification tasks. Traditional approaches employ hand-crafted features for classification while convolutional neural networks (CNN) are able to extract features automatically. In this paper, we use dilated convolutional neural network for multivariate time series classification. To deploy dilated CNN, a multivariate time series is transformed into an image-like style and stacks of dilated and strided convolutions are applied to extract in and between features of variates in time series simultaneously. We evaluate our model on two human activity recognition time series, finding that the automatic features extracted for the time series can be as effective as hand-crafted features.
Tasks Activity Recognition, Human Activity Recognition, Time Series, Time Series Classification
Published 2019-05-05
URL https://arxiv.org/abs/1905.01697v1
PDF https://arxiv.org/pdf/1905.01697v1.pdf
PWC https://paperswithcode.com/paper/multivariate-time-series-classification-using-1
Repo https://github.com/SonbolYb/multivariate_timeseries_dilated_conv
Framework tf

Semantic Image Synthesis with Spatially-Adaptive Normalization

Title Semantic Image Synthesis with Spatially-Adaptive Normalization
Authors Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu
Abstract We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the deep network, which is then processed through stacks of convolution, normalization, and nonlinearity layers. We show that this is suboptimal as the normalization layers tend to ``wash away’’ semantic information. To address the issue, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned transformation. Experiments on several challenging datasets demonstrate the advantage of the proposed method over existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows user control over both semantic and style. Code is available at https://github.com/NVlabs/SPADE . |
Tasks Image Generation, Image-to-Image Translation
Published 2019-03-18
URL https://arxiv.org/abs/1903.07291v2
PDF https://arxiv.org/pdf/1903.07291v2.pdf
PWC https://paperswithcode.com/paper/semantic-image-synthesis-with-spatially
Repo https://github.com/Dominioncher/smart-sketch
Framework pytorch

Causality Extraction based on Self-Attentive BiLSTM-CRF with Transferred Embeddings

Title Causality Extraction based on Self-Attentive BiLSTM-CRF with Transferred Embeddings
Authors Zhaoning Li, Qi Li, Xiaotian Zou, Jiangtao Ren
Abstract Causality extraction from natural language texts is a challenging open problem in artificial intelligence. Existing methods utilize patterns, constraints, and machine learning techniques to extract causality, heavily depend on domain knowledge and require considerable human efforts and time on feature engineering. In this paper, we formulate causality extraction as a sequence tagging problem based on a novel causality tagging scheme. On this basis, we propose a neural causality extractor with BiLSTM-CRF model as the backbone, named SCIFI (Self-Attentive BiLSTM-CRF with Flair Embeddings), which can directly extract Cause and Effect, without extracting candidate causal pairs and identifying their relations separately. To tackle the problem of data insufficiency, we transfer the contextual string embeddings, also known as Flair embeddings, which trained on a large corpus into our task. Besides, to improve the performance of causality extraction, we introduce the multi-head self-attention mechanism into SCIFI to learn the dependencies between causal words. We evaluate our method on a public dataset, and experimental results demonstrate that our method achieves significant and consistent improvement as compared to other baselines.
Tasks Feature Engineering
Published 2019-04-16
URL https://arxiv.org/abs/1904.07629v4
PDF https://arxiv.org/pdf/1904.07629v4.pdf
PWC https://paperswithcode.com/paper/causality-extraction-based-on-self-attentive
Repo https://github.com/Das-Boot/scifi
Framework none

Attention Mechanism Enhanced Kernel Prediction Networks for Denoising of Burst Images

Title Attention Mechanism Enhanced Kernel Prediction Networks for Denoising of Burst Images
Authors Bin Zhang, Shenyao Jin, Yili Xia, Yongming Huang, Zixiang Xiong
Abstract Deep learning based image denoising methods have been extensively investigated. In this paper, attention mechanism enhanced kernel prediction networks (AME-KPNs) are proposed for burst image denoising, in which, nearly cost-free attention modules are adopted to first refine the feature maps and to further make a full use of the inter-frame and intra-frame redundancies within the whole image burst. The proposed AME-KPNs output per-pixel spatially-adaptive kernels, residual maps and corresponding weight maps, in which, the predicted kernels roughly restore clean pixels at their corresponding locations via an adaptive convolution operation, and subsequently, residuals are weighted and summed to compensate the limited receptive field of predicted kernels. Simulations and real-world experiments are conducted to illustrate the robustness of the proposed AME-KPNs in burst image denoising.
Tasks Denoising, Image Denoising
Published 2019-10-18
URL https://arxiv.org/abs/1910.08313v2
PDF https://arxiv.org/pdf/1910.08313v2.pdf
PWC https://paperswithcode.com/paper/attention-mechanism-enhanced-kernel
Repo https://github.com/z-bingo/Attention-Mechanism-Enhanced-KPN
Framework pytorch

Multiple Human Tracking using Multi-Cues including Primitive Action Features

Title Multiple Human Tracking using Multi-Cues including Primitive Action Features
Authors Hitoshi Nishimura, Kazuyuki Tasaka, Yasutomo Kawanishi, Hiroshi Murase
Abstract In this paper, we propose a Multiple Human Tracking method using multi-cues including Primitive Action Features (MHT-PAF). MHT-PAF can perform the accurate human tracking in dynamic aerial videos captured by a drone. PAF employs a global context, rich information by multi-label actions, and a middle level feature. The accurate human tracking result using PAF helps multi-frame-based action recognition. In the experiments, we verified the effectiveness of the proposed method using the Okutama-Action dataset. Our code is available online.
Tasks
Published 2019-09-18
URL https://arxiv.org/abs/1909.08171v1
PDF https://arxiv.org/pdf/1909.08171v1.pdf
PWC https://paperswithcode.com/paper/multiple-human-tracking-using-multi-cues
Repo https://github.com/hitottiez/mht-paf
Framework none

Learning from Synthetic Data for Crowd Counting in the Wild

Title Learning from Synthetic Data for Crowd Counting in the Wild
Authors Qi Wang, Junyu Gao, Wei Lin, Yuan Yuan
Abstract Recently, counting the number of people for crowd scenes is a hot topic because of its widespread applications (e.g. video surveillance, public security). It is a difficult task in the wild: changeable environment, large-range number of people cause the current methods can not work well. In addition, due to the scarce data, many methods suffer from over-fitting to a different extent. To remedy the above two problems, firstly, we develop a data collector and labeler, which can generate the synthetic crowd scenes and simultaneously annotate them without any manpower. Based on it, we build a large-scale, diverse synthetic dataset. Secondly, we propose two schemes that exploit the synthetic data to boost the performance of crowd counting in the wild: 1) pretrain a crowd counter on the synthetic data, then finetune it using the real data, which significantly prompts the model’s performance on real data; 2) propose a crowd counting method via domain adaptation, which can free humans from heavy data annotations. Extensive experiments show that the first method achieves the state-of-the-art performance on four real datasets, and the second outperforms our baselines. The dataset and source code are available at https://gjy3035.github.io/GCC-CL/.
Tasks Crowd Counting, Domain Adaptation
Published 2019-03-08
URL http://arxiv.org/abs/1903.03303v1
PDF http://arxiv.org/pdf/1903.03303v1.pdf
PWC https://paperswithcode.com/paper/learning-from-synthetic-data-for-crowd
Repo https://github.com/gjy3035/GCC-SFCN
Framework pytorch

Diamonds in the Rough: Generating Fluent Sentences from Early-Stage Drafts for Academic Writing Assistance

Title Diamonds in the Rough: Generating Fluent Sentences from Early-Stage Drafts for Academic Writing Assistance
Authors Takumi Ito, Tatsuki Kuribayashi, Hayato Kobayashi, Ana Brassard, Masato Hagiwara, Jun Suzuki, Kentaro Inui
Abstract The writing process consists of several stages such as drafting, revising, editing, and proofreading. Studies on writing assistance, such as grammatical error correction (GEC), have mainly focused on sentence editing and proofreading, where surface-level issues such as typographical, spelling, or grammatical errors should be corrected. We broaden this focus to include the earlier revising stage, where sentences require adjustment to the information included or major rewriting and propose Sentence-level Revision (SentRev) as a new writing assistance task. Well-performing systems in this task can help inexperienced authors by producing fluent, complete sentences given their rough, incomplete drafts. We build a new freely available crowdsourced evaluation dataset consisting of incomplete sentences authored by non-native writers paired with their final versions extracted from published academic papers for developing and evaluating SentRev models. We also establish baseline performance on SentRev using our newly built evaluation dataset.
Tasks Grammatical Error Correction
Published 2019-10-21
URL https://arxiv.org/abs/1910.09180v1
PDF https://arxiv.org/pdf/1910.09180v1.pdf
PWC https://paperswithcode.com/paper/diamonds-in-the-rough-generating-fluent
Repo https://github.com/taku-ito/INLG2019_SentRev
Framework none

MultiGrain: a unified image embedding for classes and instances

Title MultiGrain: a unified image embedding for classes and instances
Authors Maxim Berman, Hervé Jégou, Andrea Vedaldi, Iasonas Kokkinos, Matthijs Douze
Abstract MultiGrain is a network architecture producing compact vector representations that are suited both for image classification and particular object retrieval. It builds on a standard classification trunk. The top of the network produces an embedding containing coarse and fine-grained information, so that images can be recognized based on the object class, particular object, or if they are distorted copies. Our joint training is simple: we minimize a cross-entropy loss for classification and a ranking loss that determines if two images are identical up to data augmentation, with no need for additional labels. A key component of MultiGrain is a pooling layer that takes advantage of high-resolution images with a network trained at a lower resolution. When fed to a linear classifier, the learned embeddings provide state-of-the-art classification accuracy. For instance, we obtain 79.4% top-1 accuracy with a ResNet-50 learned on Imagenet, which is a +1.8% absolute improvement over the AutoAugment method. When compared with the cosine similarity, the same embeddings perform on par with the state-of-the-art for image retrieval at moderate resolutions.
Tasks Data Augmentation, Image Classification, Image Retrieval
Published 2019-02-14
URL http://arxiv.org/abs/1902.05509v2
PDF http://arxiv.org/pdf/1902.05509v2.pdf
PWC https://paperswithcode.com/paper/multigrain-a-unified-image-embedding-for
Repo https://github.com/facebookresearch/multigrain
Framework pytorch

Language-Agnostic Syllabification with Neural Sequence Labeling

Title Language-Agnostic Syllabification with Neural Sequence Labeling
Authors Jacob Krantz, Maxwell Dulin, Paul De Palma
Abstract The identification of syllables within phonetic sequences is known as syllabification. This task is thought to play an important role in natural language understanding, speech production, and the development of speech recognition systems. The concept of the syllable is cross-linguistic, though formal definitions are rarely agreed upon, even within a language. In response, data-driven syllabification methods have been developed to learn from syllabified examples. These methods often employ classical machine learning sequence labeling models. In recent years, recurrence-based neural networks have been shown to perform increasingly well for sequence labeling tasks such as named entity recognition (NER), part of speech (POS) tagging, and chunking. We present a novel approach to the syllabification problem which leverages modern neural network techniques. Our network is constructed with long short-term memory (LSTM) cells, a convolutional component, and a conditional random field (CRF) output layer. Existing syllabification approaches are rarely evaluated across multiple language families. To demonstrate cross-linguistic generalizability, we show that the network is competitive with state of the art systems in syllabifying English, Dutch, Italian, French, Manipuri, and Basque datasets.
Tasks Chunking, Named Entity Recognition, Part-Of-Speech Tagging, Speech Recognition
Published 2019-09-29
URL https://arxiv.org/abs/1909.13362v1
PDF https://arxiv.org/pdf/1909.13362v1.pdf
PWC https://paperswithcode.com/paper/language-agnostic-syllabification-with-neural
Repo https://github.com/jacobkrantz/lstm-syllabify
Framework none

GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs

Title GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs
Authors Yuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng, Hujun Bao, Xiaowei Zhou
Abstract Finding local correspondences between images with different viewpoints requires local descriptors that are robust against geometric transformations. An approach for transformation invariance is to integrate out the transformations by pooling the features extracted from transformed versions of an image. However, the feature pooling may sacrifice the distinctiveness of the resulting descriptors. In this paper, we introduce a novel visual descriptor named Group Invariant Feature Transform (GIFT), which is both discriminative and robust to geometric transformations. The key idea is that the features extracted from the transformed versions of an image can be viewed as a function defined on the group of the transformations. Instead of feature pooling, we use group convolutions to exploit underlying structures of the extracted features on the group, resulting in descriptors that are both discriminative and provably invariant to the group of transformations. Extensive experiments show that GIFT outperforms state-of-the-art methods on several benchmark datasets and practically improves the performance of relative pose estimation.
Tasks Pose Estimation
Published 2019-11-14
URL https://arxiv.org/abs/1911.05932v1
PDF https://arxiv.org/pdf/1911.05932v1.pdf
PWC https://paperswithcode.com/paper/gift-learning-transformation-invariant-dense-1
Repo https://github.com/zju3dv/GIFT
Framework pytorch

Graph-RISE: Graph-Regularized Image Semantic Embedding

Title Graph-RISE: Graph-Regularized Image Semantic Embedding
Authors Da-Cheng Juan, Chun-Ta Lu, Zhen Li, Futang Peng, Aleksei Timofeev, Yi-Ting Chen, Yaxi Gao, Tom Duerig, Andrew Tomkins, Sujith Ravi
Abstract Learning image representations to capture fine-grained semantics has been a challenging and important task enabling many applications such as image search and clustering. In this paper, we present Graph-Regularized Image Semantic Embedding (Graph-RISE), a large-scale neural graph learning framework that allows us to train embeddings to discriminate an unprecedented O(40M) ultra-fine-grained semantic labels. Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE effectively captures semantics and, compared to the state-of-the-art, differentiates nuances at levels that are closer to human-perception.
Tasks Image Classification, Image Retrieval
Published 2019-02-14
URL http://arxiv.org/abs/1902.10814v1
PDF http://arxiv.org/pdf/1902.10814v1.pdf
PWC https://paperswithcode.com/paper/graph-rise-graph-regularized-image-semantic
Repo https://github.com/tensorflow/neural-structured-learning
Framework tf

Learning Fairness in Multi-Agent Systems

Title Learning Fairness in Multi-Agent Systems
Authors Jiechuan Jiang, Zongqing Lu
Abstract Fairness is essential for human society, contributing to stability and productivity. Similarly, fairness is also the key for many multi-agent systems. Taking fairness into multi-agent learning could help multi-agent systems become both efficient and stable. However, learning efficiency and fairness simultaneously is a complex, multi-objective, joint-policy optimization. To tackle these difficulties, we propose FEN, a novel hierarchical reinforcement learning model. We first decompose fairness for each agent and propose fair-efficient reward that each agent learns its own policy to optimize. To avoid multi-objective conflict, we design a hierarchy consisting of a controller and several sub-policies, where the controller maximizes the fair-efficient reward by switching among the sub-policies that provides diverse behaviors to interact with the environment. FEN can be trained in a fully decentralized way, making it easy to be deployed in real-world applications. Empirically, we show that FEN easily learns both fairness and efficiency and significantly outperforms baselines in a variety of multi-agent scenarios.
Tasks Hierarchical Reinforcement Learning
Published 2019-10-31
URL https://arxiv.org/abs/1910.14472v1
PDF https://arxiv.org/pdf/1910.14472v1.pdf
PWC https://paperswithcode.com/paper/learning-fairness-in-multi-agent-systems
Repo https://github.com/PKU-AI-Edge/FEN
Framework tf
comments powered by Disqus