February 1, 2020

2884 words 14 mins read

Paper Group AWR 109

A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation. Revisiting Graph Neural Networks: All We Have is Low-Pass Filters. Human activity recognition from skeleton poses. Multivariate Time Series Classification using Dilated Convolutional Neural Network. Semantic Image Synthesis with Spatially-Adaptive Normalization. Causa …

A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation


Title	A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation
Authors	Varun Khare, Divyat Mahajan, Homanga Bharadhwaj, Vinay Verma, Piyush Rai
Abstract	We present a domain adaptation based generative framework for zero-shot learning. Our framework addresses the problem of domain shift between the seen and unseen class distributions in zero-shot learning and minimizes the shift by developing a generative model trained via adversarial domain adaptation. Our approach is based on end-to-end learning of the class distributions of seen classes and unseen classes. To enable the model to learn the class distributions of unseen classes, we parameterize these class distributions in terms of the class attribute information (which is available for both seen and unseen classes). This provides a very simple way to learn the class distribution of any unseen class, given only its class attribute information, and no labeled training data. Training this model with adversarial domain adaptation further provides robustness against the distribution mismatch between the data from seen and unseen classes. Our approach also provides a novel way for training neural net based classifiers to overcome the hubness problem in zero-shot learning. Through a comprehensive set of experiments, we show that our model yields superior accuracies as compared to various state-of-the-art zero shot learning models, on a variety of benchmark datasets. Code for the experiments is available at github.com/vkkhare/ZSL-ADA
Tasks	Domain Adaptation, Zero-Shot Learning
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03038v3
PDF	https://arxiv.org/pdf/1906.03038v3.pdf
PWC	https://paperswithcode.com/paper/a-generative-framework-for-zero-shot-learning
Repo	https://github.com/vkkhare/ZSL-ADA
Framework	none

Revisiting Graph Neural Networks: All We Have is Low-Pass Filters


Title	Revisiting Graph Neural Networks: All We Have is Low-Pass Filters
Authors	Hoang NT, Takanori Maehara
Abstract	Graph neural networks have become one of the most important techniques to solve machine learning problems on graph-structured data. Recent work on vertex classification proposed deep and distributed learning models to achieve high performance and scalability. However, we find that the feature vectors of benchmark datasets are already quite informative for the classification task, and the graph structure only provides a means to denoise the data. In this paper, we develop a theoretical framework based on graph signal processing for analyzing graph neural networks. Our results indicate that graph neural networks only perform low-pass filtering on feature vectors and do not have the non-linear manifold learning property. We further investigate their resilience to feature noise and propose some insights on GCN-based graph neural network design.
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09550v2
PDF	https://arxiv.org/pdf/1905.09550v2.pdf
PWC	https://paperswithcode.com/paper/revisiting-graph-neural-networks-all-we-have
Repo	https://github.com/gear/gfnn
Framework	pytorch

Human activity recognition from skeleton poses


Title	Human activity recognition from skeleton poses
Authors	Frederico Belmonte Klein, Angelo Cangelosi
Abstract	Human Action Recognition is an important task of Human Robot Interaction as cooperation between robots and humans requires that artificial agents recognise complex cues from the environment. A promising approach is using trained classifiers to recognise human actions through sequences of skeleton poses extracted from images or RGB-D data from a sensor. However, with many different data-sets focused on slightly different sets of actions and different algorithms it is not clear which strategy produces highest accuracy for indoor activities performed in a home environment. This work discussed, tested and compared classic algorithms, namely, support vector machines and k-nearest neighbours, to 2 similar hierarchical neural gas approaches, the growing when required neural gas and the growing neural gas.
Tasks	Activity Recognition, Human Activity Recognition, Temporal Action Localization
Published	2019-08-20
URL	https://arxiv.org/abs/1908.08928v1
PDF	https://arxiv.org/pdf/1908.08928v1.pdf
PWC	https://paperswithcode.com/paper/human-activity-recognition-from-skeleton
Repo	https://github.com/frederico-klein/cad-gas
Framework	none

Multivariate Time Series Classification using Dilated Convolutional Neural Network


Title	Multivariate Time Series Classification using Dilated Convolutional Neural Network
Authors	Omolbanin Yazdanbakhsh, Scott Dick
Abstract	Multivariate time series classification is a high value and well-known problem in machine learning community. Feature extraction is a main step in classification tasks. Traditional approaches employ hand-crafted features for classification while convolutional neural networks (CNN) are able to extract features automatically. In this paper, we use dilated convolutional neural network for multivariate time series classification. To deploy dilated CNN, a multivariate time series is transformed into an image-like style and stacks of dilated and strided convolutions are applied to extract in and between features of variates in time series simultaneously. We evaluate our model on two human activity recognition time series, finding that the automatic features extracted for the time series can be as effective as hand-crafted features.
Tasks	Activity Recognition, Human Activity Recognition, Time Series, Time Series Classification
Published	2019-05-05
URL	https://arxiv.org/abs/1905.01697v1
PDF	https://arxiv.org/pdf/1905.01697v1.pdf
PWC	https://paperswithcode.com/paper/multivariate-time-series-classification-using-1
Repo	https://github.com/SonbolYb/multivariate_timeseries_dilated_conv
Framework	tf

Semantic Image Synthesis with Spatially-Adaptive Normalization


Title	Semantic Image Synthesis with Spatially-Adaptive Normalization
Authors	Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu
Abstract	We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the deep network, which is then processed through stacks of convolution, normalization, and nonlinearity layers. We show that this is suboptimal as the normalization layers tend to ``wash away’’ semantic information. To address the issue, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned transformation. Experiments on several challenging datasets demonstrate the advantage of the proposed method over existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows user control over both semantic and style. Code is available at https://github.com/NVlabs/SPADE . \|
Tasks	Image Generation, Image-to-Image Translation
Published	2019-03-18
URL	https://arxiv.org/abs/1903.07291v2
PDF	https://arxiv.org/pdf/1903.07291v2.pdf
PWC	https://paperswithcode.com/paper/semantic-image-synthesis-with-spatially
Repo	https://github.com/Dominioncher/smart-sketch
Framework	pytorch

Causality Extraction based on Self-Attentive BiLSTM-CRF with Transferred Embeddings


Title	Causality Extraction based on Self-Attentive BiLSTM-CRF with Transferred Embeddings
Authors	Zhaoning Li, Qi Li, Xiaotian Zou, Jiangtao Ren
Abstract	Causality extraction from natural language texts is a challenging open problem in artificial intelligence. Existing methods utilize patterns, constraints, and machine learning techniques to extract causality, heavily depend on domain knowledge and require considerable human efforts and time on feature engineering. In this paper, we formulate causality extraction as a sequence tagging problem based on a novel causality tagging scheme. On this basis, we propose a neural causality extractor with BiLSTM-CRF model as the backbone, named SCIFI (Self-Attentive BiLSTM-CRF with Flair Embeddings), which can directly extract Cause and Effect, without extracting candidate causal pairs and identifying their relations separately. To tackle the problem of data insufficiency, we transfer the contextual string embeddings, also known as Flair embeddings, which trained on a large corpus into our task. Besides, to improve the performance of causality extraction, we introduce the multi-head self-attention mechanism into SCIFI to learn the dependencies between causal words. We evaluate our method on a public dataset, and experimental results demonstrate that our method achieves significant and consistent improvement as compared to other baselines.
Tasks	Feature Engineering
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07629v4
PDF	https://arxiv.org/pdf/1904.07629v4.pdf
PWC	https://paperswithcode.com/paper/causality-extraction-based-on-self-attentive
Repo	https://github.com/Das-Boot/scifi
Framework	none

Attention Mechanism Enhanced Kernel Prediction Networks for Denoising of Burst Images


Title	Attention Mechanism Enhanced Kernel Prediction Networks for Denoising of Burst Images
Authors	Bin Zhang, Shenyao Jin, Yili Xia, Yongming Huang, Zixiang Xiong
Abstract	Deep learning based image denoising methods have been extensively investigated. In this paper, attention mechanism enhanced kernel prediction networks (AME-KPNs) are proposed for burst image denoising, in which, nearly cost-free attention modules are adopted to first refine the feature maps and to further make a full use of the inter-frame and intra-frame redundancies within the whole image burst. The proposed AME-KPNs output per-pixel spatially-adaptive kernels, residual maps and corresponding weight maps, in which, the predicted kernels roughly restore clean pixels at their corresponding locations via an adaptive convolution operation, and subsequently, residuals are weighted and summed to compensate the limited receptive field of predicted kernels. Simulations and real-world experiments are conducted to illustrate the robustness of the proposed AME-KPNs in burst image denoising.
Tasks	Denoising, Image Denoising
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08313v2
PDF	https://arxiv.org/pdf/1910.08313v2.pdf
PWC	https://paperswithcode.com/paper/attention-mechanism-enhanced-kernel
Repo	https://github.com/z-bingo/Attention-Mechanism-Enhanced-KPN
Framework	pytorch

Multiple Human Tracking using Multi-Cues including Primitive Action Features


Title	Multiple Human Tracking using Multi-Cues including Primitive Action Features
Authors	Hitoshi Nishimura, Kazuyuki Tasaka, Yasutomo Kawanishi, Hiroshi Murase
Abstract	In this paper, we propose a Multiple Human Tracking method using multi-cues including Primitive Action Features (MHT-PAF). MHT-PAF can perform the accurate human tracking in dynamic aerial videos captured by a drone. PAF employs a global context, rich information by multi-label actions, and a middle level feature. The accurate human tracking result using PAF helps multi-frame-based action recognition. In the experiments, we verified the effectiveness of the proposed method using the Okutama-Action dataset. Our code is available online.
Tasks
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08171v1
PDF	https://arxiv.org/pdf/1909.08171v1.pdf
PWC	https://paperswithcode.com/paper/multiple-human-tracking-using-multi-cues
Repo	https://github.com/hitottiez/mht-paf
Framework	none

Learning from Synthetic Data for Crowd Counting in the Wild


Title	Learning from Synthetic Data for Crowd Counting in the Wild
Authors	Qi Wang, Junyu Gao, Wei Lin, Yuan Yuan
Abstract	Recently, counting the number of people for crowd scenes is a hot topic because of its widespread applications (e.g. video surveillance, public security). It is a difficult task in the wild: changeable environment, large-range number of people cause the current methods can not work well. In addition, due to the scarce data, many methods suffer from over-fitting to a different extent. To remedy the above two problems, firstly, we develop a data collector and labeler, which can generate the synthetic crowd scenes and simultaneously annotate them without any manpower. Based on it, we build a large-scale, diverse synthetic dataset. Secondly, we propose two schemes that exploit the synthetic data to boost the performance of crowd counting in the wild: 1) pretrain a crowd counter on the synthetic data, then finetune it using the real data, which significantly prompts the model’s performance on real data; 2) propose a crowd counting method via domain adaptation, which can free humans from heavy data annotations. Extensive experiments show that the first method achieves the state-of-the-art performance on four real datasets, and the second outperforms our baselines. The dataset and source code are available at https://gjy3035.github.io/GCC-CL/.
Tasks	Crowd Counting, Domain Adaptation
Published	2019-03-08
URL	http://arxiv.org/abs/1903.03303v1
PDF	http://arxiv.org/pdf/1903.03303v1.pdf
PWC	https://paperswithcode.com/paper/learning-from-synthetic-data-for-crowd
Repo	https://github.com/gjy3035/GCC-SFCN
Framework	pytorch

Diamonds in the Rough: Generating Fluent Sentences from Early-Stage Drafts for Academic Writing Assistance


Title	Diamonds in the Rough: Generating Fluent Sentences from Early-Stage Drafts for Academic Writing Assistance
Authors	Takumi Ito, Tatsuki Kuribayashi, Hayato Kobayashi, Ana Brassard, Masato Hagiwara, Jun Suzuki, Kentaro Inui
Abstract	The writing process consists of several stages such as drafting, revising, editing, and proofreading. Studies on writing assistance, such as grammatical error correction (GEC), have mainly focused on sentence editing and proofreading, where surface-level issues such as typographical, spelling, or grammatical errors should be corrected. We broaden this focus to include the earlier revising stage, where sentences require adjustment to the information included or major rewriting and propose Sentence-level Revision (SentRev) as a new writing assistance task. Well-performing systems in this task can help inexperienced authors by producing fluent, complete sentences given their rough, incomplete drafts. We build a new freely available crowdsourced evaluation dataset consisting of incomplete sentences authored by non-native writers paired with their final versions extracted from published academic papers for developing and evaluating SentRev models. We also establish baseline performance on SentRev using our newly built evaluation dataset.
Tasks	Grammatical Error Correction
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09180v1
PDF	https://arxiv.org/pdf/1910.09180v1.pdf
PWC	https://paperswithcode.com/paper/diamonds-in-the-rough-generating-fluent
Repo	https://github.com/taku-ito/INLG2019_SentRev
Framework	none

MultiGrain: a unified image embedding for classes and instances


Title	MultiGrain: a unified image embedding for classes and instances
Authors	Maxim Berman, Hervé Jégou, Andrea Vedaldi, Iasonas Kokkinos, Matthijs Douze
Abstract	MultiGrain is a network architecture producing compact vector representations that are suited both for image classification and particular object retrieval. It builds on a standard classification trunk. The top of the network produces an embedding containing coarse and fine-grained information, so that images can be recognized based on the object class, particular object, or if they are distorted copies. Our joint training is simple: we minimize a cross-entropy loss for classification and a ranking loss that determines if two images are identical up to data augmentation, with no need for additional labels. A key component of MultiGrain is a pooling layer that takes advantage of high-resolution images with a network trained at a lower resolution. When fed to a linear classifier, the learned embeddings provide state-of-the-art classification accuracy. For instance, we obtain 79.4% top-1 accuracy with a ResNet-50 learned on Imagenet, which is a +1.8% absolute improvement over the AutoAugment method. When compared with the cosine similarity, the same embeddings perform on par with the state-of-the-art for image retrieval at moderate resolutions.
Tasks	Data Augmentation, Image Classification, Image Retrieval
Published	2019-02-14
URL	http://arxiv.org/abs/1902.05509v2
PDF	http://arxiv.org/pdf/1902.05509v2.pdf
PWC	https://paperswithcode.com/paper/multigrain-a-unified-image-embedding-for
Repo	https://github.com/facebookresearch/multigrain
Framework	pytorch

Language-Agnostic Syllabification with Neural Sequence Labeling


Title	Language-Agnostic Syllabification with Neural Sequence Labeling
Authors	Jacob Krantz, Maxwell Dulin, Paul De Palma
Abstract	The identification of syllables within phonetic sequences is known as syllabification. This task is thought to play an important role in natural language understanding, speech production, and the development of speech recognition systems. The concept of the syllable is cross-linguistic, though formal definitions are rarely agreed upon, even within a language. In response, data-driven syllabification methods have been developed to learn from syllabified examples. These methods often employ classical machine learning sequence labeling models. In recent years, recurrence-based neural networks have been shown to perform increasingly well for sequence labeling tasks such as named entity recognition (NER), part of speech (POS) tagging, and chunking. We present a novel approach to the syllabification problem which leverages modern neural network techniques. Our network is constructed with long short-term memory (LSTM) cells, a convolutional component, and a conditional random field (CRF) output layer. Existing syllabification approaches are rarely evaluated across multiple language families. To demonstrate cross-linguistic generalizability, we show that the network is competitive with state of the art systems in syllabifying English, Dutch, Italian, French, Manipuri, and Basque datasets.
Tasks	Chunking, Named Entity Recognition, Part-Of-Speech Tagging, Speech Recognition
Published	2019-09-29
URL	https://arxiv.org/abs/1909.13362v1
PDF	https://arxiv.org/pdf/1909.13362v1.pdf
PWC	https://paperswithcode.com/paper/language-agnostic-syllabification-with-neural
Repo	https://github.com/jacobkrantz/lstm-syllabify
Framework	none

GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs


Title	GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs
Authors	Yuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng, Hujun Bao, Xiaowei Zhou
Abstract	Finding local correspondences between images with different viewpoints requires local descriptors that are robust against geometric transformations. An approach for transformation invariance is to integrate out the transformations by pooling the features extracted from transformed versions of an image. However, the feature pooling may sacrifice the distinctiveness of the resulting descriptors. In this paper, we introduce a novel visual descriptor named Group Invariant Feature Transform (GIFT), which is both discriminative and robust to geometric transformations. The key idea is that the features extracted from the transformed versions of an image can be viewed as a function defined on the group of the transformations. Instead of feature pooling, we use group convolutions to exploit underlying structures of the extracted features on the group, resulting in descriptors that are both discriminative and provably invariant to the group of transformations. Extensive experiments show that GIFT outperforms state-of-the-art methods on several benchmark datasets and practically improves the performance of relative pose estimation.
Tasks	Pose Estimation
Published	2019-11-14
URL	https://arxiv.org/abs/1911.05932v1
PDF	https://arxiv.org/pdf/1911.05932v1.pdf
PWC	https://paperswithcode.com/paper/gift-learning-transformation-invariant-dense-1
Repo	https://github.com/zju3dv/GIFT
Framework	pytorch

Graph-RISE: Graph-Regularized Image Semantic Embedding


Title	Graph-RISE: Graph-Regularized Image Semantic Embedding
Authors	Da-Cheng Juan, Chun-Ta Lu, Zhen Li, Futang Peng, Aleksei Timofeev, Yi-Ting Chen, Yaxi Gao, Tom Duerig, Andrew Tomkins, Sujith Ravi
Abstract	Learning image representations to capture fine-grained semantics has been a challenging and important task enabling many applications such as image search and clustering. In this paper, we present Graph-Regularized Image Semantic Embedding (Graph-RISE), a large-scale neural graph learning framework that allows us to train embeddings to discriminate an unprecedented O(40M) ultra-fine-grained semantic labels. Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE effectively captures semantics and, compared to the state-of-the-art, differentiates nuances at levels that are closer to human-perception.
Tasks	Image Classification, Image Retrieval
Published	2019-02-14
URL	http://arxiv.org/abs/1902.10814v1
PDF	http://arxiv.org/pdf/1902.10814v1.pdf
PWC	https://paperswithcode.com/paper/graph-rise-graph-regularized-image-semantic
Repo	https://github.com/tensorflow/neural-structured-learning
Framework	tf

Learning Fairness in Multi-Agent Systems


Title	Learning Fairness in Multi-Agent Systems
Authors	Jiechuan Jiang, Zongqing Lu
Abstract	Fairness is essential for human society, contributing to stability and productivity. Similarly, fairness is also the key for many multi-agent systems. Taking fairness into multi-agent learning could help multi-agent systems become both efficient and stable. However, learning efficiency and fairness simultaneously is a complex, multi-objective, joint-policy optimization. To tackle these difficulties, we propose FEN, a novel hierarchical reinforcement learning model. We first decompose fairness for each agent and propose fair-efficient reward that each agent learns its own policy to optimize. To avoid multi-objective conflict, we design a hierarchy consisting of a controller and several sub-policies, where the controller maximizes the fair-efficient reward by switching among the sub-policies that provides diverse behaviors to interact with the environment. FEN can be trained in a fully decentralized way, making it easy to be deployed in real-world applications. Empirically, we show that FEN easily learns both fairness and efficiency and significantly outperforms baselines in a variety of multi-agent scenarios.
Tasks	Hierarchical Reinforcement Learning
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14472v1
PDF	https://arxiv.org/pdf/1910.14472v1.pdf
PWC	https://paperswithcode.com/paper/learning-fairness-in-multi-agent-systems
Repo	https://github.com/PKU-AI-Edge/FEN
Framework	tf