Paper Group AWR 109
A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation. Revisiting Graph Neural Networks: All We Have is Low-Pass Filters. Human activity recognition from skeleton poses. Multivariate Time Series Classification using Dilated Convolutional Neural Network. Semantic Image Synthesis with Spatially-Adaptive Normalization. Causa …
A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation
Title | A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation |
Authors | Varun Khare, Divyat Mahajan, Homanga Bharadhwaj, Vinay Verma, Piyush Rai |
Abstract | We present a domain adaptation based generative framework for zero-shot learning. Our framework addresses the problem of domain shift between the seen and unseen class distributions in zero-shot learning and minimizes the shift by developing a generative model trained via adversarial domain adaptation. Our approach is based on end-to-end learning of the class distributions of seen classes and unseen classes. To enable the model to learn the class distributions of unseen classes, we parameterize these class distributions in terms of the class attribute information (which is available for both seen and unseen classes). This provides a very simple way to learn the class distribution of any unseen class, given only its class attribute information, and no labeled training data. Training this model with adversarial domain adaptation further provides robustness against the distribution mismatch between the data from seen and unseen classes. Our approach also provides a novel way for training neural net based classifiers to overcome the hubness problem in zero-shot learning. Through a comprehensive set of experiments, we show that our model yields superior accuracies as compared to various state-of-the-art zero shot learning models, on a variety of benchmark datasets. Code for the experiments is available at github.com/vkkhare/ZSL-ADA |
Tasks | Domain Adaptation, Zero-Shot Learning |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.03038v3 |
https://arxiv.org/pdf/1906.03038v3.pdf | |
PWC | https://paperswithcode.com/paper/a-generative-framework-for-zero-shot-learning |
Repo | https://github.com/vkkhare/ZSL-ADA |
Framework | none |
Revisiting Graph Neural Networks: All We Have is Low-Pass Filters
Title | Revisiting Graph Neural Networks: All We Have is Low-Pass Filters |
Authors | Hoang NT, Takanori Maehara |
Abstract | Graph neural networks have become one of the most important techniques to solve machine learning problems on graph-structured data. Recent work on vertex classification proposed deep and distributed learning models to achieve high performance and scalability. However, we find that the feature vectors of benchmark datasets are already quite informative for the classification task, and the graph structure only provides a means to denoise the data. In this paper, we develop a theoretical framework based on graph signal processing for analyzing graph neural networks. Our results indicate that graph neural networks only perform low-pass filtering on feature vectors and do not have the non-linear manifold learning property. We further investigate their resilience to feature noise and propose some insights on GCN-based graph neural network design. |
Tasks | |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09550v2 |
https://arxiv.org/pdf/1905.09550v2.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-graph-neural-networks-all-we-have |
Repo | https://github.com/gear/gfnn |
Framework | pytorch |
Human activity recognition from skeleton poses
Title | Human activity recognition from skeleton poses |
Authors | Frederico Belmonte Klein, Angelo Cangelosi |
Abstract | Human Action Recognition is an important task of Human Robot Interaction as cooperation between robots and humans requires that artificial agents recognise complex cues from the environment. A promising approach is using trained classifiers to recognise human actions through sequences of skeleton poses extracted from images or RGB-D data from a sensor. However, with many different data-sets focused on slightly different sets of actions and different algorithms it is not clear which strategy produces highest accuracy for indoor activities performed in a home environment. This work discussed, tested and compared classic algorithms, namely, support vector machines and k-nearest neighbours, to 2 similar hierarchical neural gas approaches, the growing when required neural gas and the growing neural gas. |
Tasks | Activity Recognition, Human Activity Recognition, Temporal Action Localization |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.08928v1 |
https://arxiv.org/pdf/1908.08928v1.pdf | |
PWC | https://paperswithcode.com/paper/human-activity-recognition-from-skeleton |
Repo | https://github.com/frederico-klein/cad-gas |
Framework | none |
Multivariate Time Series Classification using Dilated Convolutional Neural Network
Title | Multivariate Time Series Classification using Dilated Convolutional Neural Network |
Authors | Omolbanin Yazdanbakhsh, Scott Dick |
Abstract | Multivariate time series classification is a high value and well-known problem in machine learning community. Feature extraction is a main step in classification tasks. Traditional approaches employ hand-crafted features for classification while convolutional neural networks (CNN) are able to extract features automatically. In this paper, we use dilated convolutional neural network for multivariate time series classification. To deploy dilated CNN, a multivariate time series is transformed into an image-like style and stacks of dilated and strided convolutions are applied to extract in and between features of variates in time series simultaneously. We evaluate our model on two human activity recognition time series, finding that the automatic features extracted for the time series can be as effective as hand-crafted features. |
Tasks | Activity Recognition, Human Activity Recognition, Time Series, Time Series Classification |
Published | 2019-05-05 |
URL | https://arxiv.org/abs/1905.01697v1 |
https://arxiv.org/pdf/1905.01697v1.pdf | |
PWC | https://paperswithcode.com/paper/multivariate-time-series-classification-using-1 |
Repo | https://github.com/SonbolYb/multivariate_timeseries_dilated_conv |
Framework | tf |
Semantic Image Synthesis with Spatially-Adaptive Normalization
Title | Semantic Image Synthesis with Spatially-Adaptive Normalization |
Authors | Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu |
Abstract | We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the deep network, which is then processed through stacks of convolution, normalization, and nonlinearity layers. We show that this is suboptimal as the normalization layers tend to ``wash away’’ semantic information. To address the issue, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned transformation. Experiments on several challenging datasets demonstrate the advantage of the proposed method over existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows user control over both semantic and style. Code is available at https://github.com/NVlabs/SPADE . | |
Tasks | Image Generation, Image-to-Image Translation |
Published | 2019-03-18 |
URL | https://arxiv.org/abs/1903.07291v2 |
https://arxiv.org/pdf/1903.07291v2.pdf | |
PWC | https://paperswithcode.com/paper/semantic-image-synthesis-with-spatially |
Repo | https://github.com/Dominioncher/smart-sketch |
Framework | pytorch |
Causality Extraction based on Self-Attentive BiLSTM-CRF with Transferred Embeddings
Title | Causality Extraction based on Self-Attentive BiLSTM-CRF with Transferred Embeddings |
Authors | Zhaoning Li, Qi Li, Xiaotian Zou, Jiangtao Ren |
Abstract | Causality extraction from natural language texts is a challenging open problem in artificial intelligence. Existing methods utilize patterns, constraints, and machine learning techniques to extract causality, heavily depend on domain knowledge and require considerable human efforts and time on feature engineering. In this paper, we formulate causality extraction as a sequence tagging problem based on a novel causality tagging scheme. On this basis, we propose a neural causality extractor with BiLSTM-CRF model as the backbone, named SCIFI (Self-Attentive BiLSTM-CRF with Flair Embeddings), which can directly extract Cause and Effect, without extracting candidate causal pairs and identifying their relations separately. To tackle the problem of data insufficiency, we transfer the contextual string embeddings, also known as Flair embeddings, which trained on a large corpus into our task. Besides, to improve the performance of causality extraction, we introduce the multi-head self-attention mechanism into SCIFI to learn the dependencies between causal words. We evaluate our method on a public dataset, and experimental results demonstrate that our method achieves significant and consistent improvement as compared to other baselines. |
Tasks | Feature Engineering |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07629v4 |
https://arxiv.org/pdf/1904.07629v4.pdf | |
PWC | https://paperswithcode.com/paper/causality-extraction-based-on-self-attentive |
Repo | https://github.com/Das-Boot/scifi |
Framework | none |
Attention Mechanism Enhanced Kernel Prediction Networks for Denoising of Burst Images
Title | Attention Mechanism Enhanced Kernel Prediction Networks for Denoising of Burst Images |
Authors | Bin Zhang, Shenyao Jin, Yili Xia, Yongming Huang, Zixiang Xiong |
Abstract | Deep learning based image denoising methods have been extensively investigated. In this paper, attention mechanism enhanced kernel prediction networks (AME-KPNs) are proposed for burst image denoising, in which, nearly cost-free attention modules are adopted to first refine the feature maps and to further make a full use of the inter-frame and intra-frame redundancies within the whole image burst. The proposed AME-KPNs output per-pixel spatially-adaptive kernels, residual maps and corresponding weight maps, in which, the predicted kernels roughly restore clean pixels at their corresponding locations via an adaptive convolution operation, and subsequently, residuals are weighted and summed to compensate the limited receptive field of predicted kernels. Simulations and real-world experiments are conducted to illustrate the robustness of the proposed AME-KPNs in burst image denoising. |
Tasks | Denoising, Image Denoising |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08313v2 |
https://arxiv.org/pdf/1910.08313v2.pdf | |
PWC | https://paperswithcode.com/paper/attention-mechanism-enhanced-kernel |
Repo | https://github.com/z-bingo/Attention-Mechanism-Enhanced-KPN |
Framework | pytorch |
Multiple Human Tracking using Multi-Cues including Primitive Action Features
Title | Multiple Human Tracking using Multi-Cues including Primitive Action Features |
Authors | Hitoshi Nishimura, Kazuyuki Tasaka, Yasutomo Kawanishi, Hiroshi Murase |
Abstract | In this paper, we propose a Multiple Human Tracking method using multi-cues including Primitive Action Features (MHT-PAF). MHT-PAF can perform the accurate human tracking in dynamic aerial videos captured by a drone. PAF employs a global context, rich information by multi-label actions, and a middle level feature. The accurate human tracking result using PAF helps multi-frame-based action recognition. In the experiments, we verified the effectiveness of the proposed method using the Okutama-Action dataset. Our code is available online. |
Tasks | |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08171v1 |
https://arxiv.org/pdf/1909.08171v1.pdf | |
PWC | https://paperswithcode.com/paper/multiple-human-tracking-using-multi-cues |
Repo | https://github.com/hitottiez/mht-paf |
Framework | none |
Learning from Synthetic Data for Crowd Counting in the Wild
Title | Learning from Synthetic Data for Crowd Counting in the Wild |
Authors | Qi Wang, Junyu Gao, Wei Lin, Yuan Yuan |
Abstract | Recently, counting the number of people for crowd scenes is a hot topic because of its widespread applications (e.g. video surveillance, public security). It is a difficult task in the wild: changeable environment, large-range number of people cause the current methods can not work well. In addition, due to the scarce data, many methods suffer from over-fitting to a different extent. To remedy the above two problems, firstly, we develop a data collector and labeler, which can generate the synthetic crowd scenes and simultaneously annotate them without any manpower. Based on it, we build a large-scale, diverse synthetic dataset. Secondly, we propose two schemes that exploit the synthetic data to boost the performance of crowd counting in the wild: 1) pretrain a crowd counter on the synthetic data, then finetune it using the real data, which significantly prompts the model’s performance on real data; 2) propose a crowd counting method via domain adaptation, which can free humans from heavy data annotations. Extensive experiments show that the first method achieves the state-of-the-art performance on four real datasets, and the second outperforms our baselines. The dataset and source code are available at https://gjy3035.github.io/GCC-CL/. |
Tasks | Crowd Counting, Domain Adaptation |
Published | 2019-03-08 |
URL | http://arxiv.org/abs/1903.03303v1 |
http://arxiv.org/pdf/1903.03303v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-synthetic-data-for-crowd |
Repo | https://github.com/gjy3035/GCC-SFCN |
Framework | pytorch |
Diamonds in the Rough: Generating Fluent Sentences from Early-Stage Drafts for Academic Writing Assistance
Title | Diamonds in the Rough: Generating Fluent Sentences from Early-Stage Drafts for Academic Writing Assistance |
Authors | Takumi Ito, Tatsuki Kuribayashi, Hayato Kobayashi, Ana Brassard, Masato Hagiwara, Jun Suzuki, Kentaro Inui |
Abstract | The writing process consists of several stages such as drafting, revising, editing, and proofreading. Studies on writing assistance, such as grammatical error correction (GEC), have mainly focused on sentence editing and proofreading, where surface-level issues such as typographical, spelling, or grammatical errors should be corrected. We broaden this focus to include the earlier revising stage, where sentences require adjustment to the information included or major rewriting and propose Sentence-level Revision (SentRev) as a new writing assistance task. Well-performing systems in this task can help inexperienced authors by producing fluent, complete sentences given their rough, incomplete drafts. We build a new freely available crowdsourced evaluation dataset consisting of incomplete sentences authored by non-native writers paired with their final versions extracted from published academic papers for developing and evaluating SentRev models. We also establish baseline performance on SentRev using our newly built evaluation dataset. |
Tasks | Grammatical Error Correction |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09180v1 |
https://arxiv.org/pdf/1910.09180v1.pdf | |
PWC | https://paperswithcode.com/paper/diamonds-in-the-rough-generating-fluent |
Repo | https://github.com/taku-ito/INLG2019_SentRev |
Framework | none |
MultiGrain: a unified image embedding for classes and instances
Title | MultiGrain: a unified image embedding for classes and instances |
Authors | Maxim Berman, Hervé Jégou, Andrea Vedaldi, Iasonas Kokkinos, Matthijs Douze |
Abstract | MultiGrain is a network architecture producing compact vector representations that are suited both for image classification and particular object retrieval. It builds on a standard classification trunk. The top of the network produces an embedding containing coarse and fine-grained information, so that images can be recognized based on the object class, particular object, or if they are distorted copies. Our joint training is simple: we minimize a cross-entropy loss for classification and a ranking loss that determines if two images are identical up to data augmentation, with no need for additional labels. A key component of MultiGrain is a pooling layer that takes advantage of high-resolution images with a network trained at a lower resolution. When fed to a linear classifier, the learned embeddings provide state-of-the-art classification accuracy. For instance, we obtain 79.4% top-1 accuracy with a ResNet-50 learned on Imagenet, which is a +1.8% absolute improvement over the AutoAugment method. When compared with the cosine similarity, the same embeddings perform on par with the state-of-the-art for image retrieval at moderate resolutions. |
Tasks | Data Augmentation, Image Classification, Image Retrieval |
Published | 2019-02-14 |
URL | http://arxiv.org/abs/1902.05509v2 |
http://arxiv.org/pdf/1902.05509v2.pdf | |
PWC | https://paperswithcode.com/paper/multigrain-a-unified-image-embedding-for |
Repo | https://github.com/facebookresearch/multigrain |
Framework | pytorch |
Language-Agnostic Syllabification with Neural Sequence Labeling
Title | Language-Agnostic Syllabification with Neural Sequence Labeling |
Authors | Jacob Krantz, Maxwell Dulin, Paul De Palma |
Abstract | The identification of syllables within phonetic sequences is known as syllabification. This task is thought to play an important role in natural language understanding, speech production, and the development of speech recognition systems. The concept of the syllable is cross-linguistic, though formal definitions are rarely agreed upon, even within a language. In response, data-driven syllabification methods have been developed to learn from syllabified examples. These methods often employ classical machine learning sequence labeling models. In recent years, recurrence-based neural networks have been shown to perform increasingly well for sequence labeling tasks such as named entity recognition (NER), part of speech (POS) tagging, and chunking. We present a novel approach to the syllabification problem which leverages modern neural network techniques. Our network is constructed with long short-term memory (LSTM) cells, a convolutional component, and a conditional random field (CRF) output layer. Existing syllabification approaches are rarely evaluated across multiple language families. To demonstrate cross-linguistic generalizability, we show that the network is competitive with state of the art systems in syllabifying English, Dutch, Italian, French, Manipuri, and Basque datasets. |
Tasks | Chunking, Named Entity Recognition, Part-Of-Speech Tagging, Speech Recognition |
Published | 2019-09-29 |
URL | https://arxiv.org/abs/1909.13362v1 |
https://arxiv.org/pdf/1909.13362v1.pdf | |
PWC | https://paperswithcode.com/paper/language-agnostic-syllabification-with-neural |
Repo | https://github.com/jacobkrantz/lstm-syllabify |
Framework | none |
GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs
Title | GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs |
Authors | Yuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng, Hujun Bao, Xiaowei Zhou |
Abstract | Finding local correspondences between images with different viewpoints requires local descriptors that are robust against geometric transformations. An approach for transformation invariance is to integrate out the transformations by pooling the features extracted from transformed versions of an image. However, the feature pooling may sacrifice the distinctiveness of the resulting descriptors. In this paper, we introduce a novel visual descriptor named Group Invariant Feature Transform (GIFT), which is both discriminative and robust to geometric transformations. The key idea is that the features extracted from the transformed versions of an image can be viewed as a function defined on the group of the transformations. Instead of feature pooling, we use group convolutions to exploit underlying structures of the extracted features on the group, resulting in descriptors that are both discriminative and provably invariant to the group of transformations. Extensive experiments show that GIFT outperforms state-of-the-art methods on several benchmark datasets and practically improves the performance of relative pose estimation. |
Tasks | Pose Estimation |
Published | 2019-11-14 |
URL | https://arxiv.org/abs/1911.05932v1 |
https://arxiv.org/pdf/1911.05932v1.pdf | |
PWC | https://paperswithcode.com/paper/gift-learning-transformation-invariant-dense-1 |
Repo | https://github.com/zju3dv/GIFT |
Framework | pytorch |
Graph-RISE: Graph-Regularized Image Semantic Embedding
Title | Graph-RISE: Graph-Regularized Image Semantic Embedding |
Authors | Da-Cheng Juan, Chun-Ta Lu, Zhen Li, Futang Peng, Aleksei Timofeev, Yi-Ting Chen, Yaxi Gao, Tom Duerig, Andrew Tomkins, Sujith Ravi |
Abstract | Learning image representations to capture fine-grained semantics has been a challenging and important task enabling many applications such as image search and clustering. In this paper, we present Graph-Regularized Image Semantic Embedding (Graph-RISE), a large-scale neural graph learning framework that allows us to train embeddings to discriminate an unprecedented O(40M) ultra-fine-grained semantic labels. Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE effectively captures semantics and, compared to the state-of-the-art, differentiates nuances at levels that are closer to human-perception. |
Tasks | Image Classification, Image Retrieval |
Published | 2019-02-14 |
URL | http://arxiv.org/abs/1902.10814v1 |
http://arxiv.org/pdf/1902.10814v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-rise-graph-regularized-image-semantic |
Repo | https://github.com/tensorflow/neural-structured-learning |
Framework | tf |
Learning Fairness in Multi-Agent Systems
Title | Learning Fairness in Multi-Agent Systems |
Authors | Jiechuan Jiang, Zongqing Lu |
Abstract | Fairness is essential for human society, contributing to stability and productivity. Similarly, fairness is also the key for many multi-agent systems. Taking fairness into multi-agent learning could help multi-agent systems become both efficient and stable. However, learning efficiency and fairness simultaneously is a complex, multi-objective, joint-policy optimization. To tackle these difficulties, we propose FEN, a novel hierarchical reinforcement learning model. We first decompose fairness for each agent and propose fair-efficient reward that each agent learns its own policy to optimize. To avoid multi-objective conflict, we design a hierarchy consisting of a controller and several sub-policies, where the controller maximizes the fair-efficient reward by switching among the sub-policies that provides diverse behaviors to interact with the environment. FEN can be trained in a fully decentralized way, making it easy to be deployed in real-world applications. Empirically, we show that FEN easily learns both fairness and efficiency and significantly outperforms baselines in a variety of multi-agent scenarios. |
Tasks | Hierarchical Reinforcement Learning |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14472v1 |
https://arxiv.org/pdf/1910.14472v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-fairness-in-multi-agent-systems |
Repo | https://github.com/PKU-AI-Edge/FEN |
Framework | tf |