January 31, 2020

3294 words 16 mins read

Paper Group AWR 379

Paper Group AWR 379

#MeTooMA: Multi-Aspect Annotations of Tweets Related to the MeToo Movement. Simple yet efficient real-time pose-based action recognition. Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German. Unsupervised Dialog Structure Learning. EATEN: Entity-aware Attention for Single Shot Visual Text Extract …

Title #MeTooMA: Multi-Aspect Annotations of Tweets Related to the MeToo Movement
Authors Akash Gautam, Puneet Mathur, Rakesh Gosangi, Debanjan Mahata, Ramit Sawhney, Rajiv Ratn Shah
Abstract In this paper, we present a dataset containing 9,973 tweets related to the MeToo movement that were manually annotated for five different linguistic aspects: relevance, stance, hate speech, sarcasm, and dialogue acts. We present a detailed account of the data collection and annotation processes. The annotations have a very high inter-annotator agreement (0.79 to 0.93 k-alpha) due to the domain expertise of the annotators and clear annotation instructions. We analyze the data in terms of geographical distribution, label correlations, and keywords. Lastly, we present some potential use cases of this dataset. We expect this dataset would be of great interest to psycholinguists, socio-linguists, and computational linguists to study the discursive space of digitally mobilized social movements on sensitive issues like sexual harassment.
Tasks
Published 2019-12-14
URL https://arxiv.org/abs/1912.06927v1
PDF https://arxiv.org/pdf/1912.06927v1.pdf
PWC https://paperswithcode.com/paper/metooma-multi-aspect-annotations-of-tweets
Repo https://github.com/midas-research/MeTooMA
Framework none

Simple yet efficient real-time pose-based action recognition

Title Simple yet efficient real-time pose-based action recognition
Authors Dennis Ludl, Thomas Gulde, Cristóbal Curio
Abstract Recognizing human actions is a core challenge for autonomous systems as they directly share the same space with humans. Systems must be able to recognize and assess human actions in real-time. In order to train corresponding data-driven algorithms, a significant amount of annotated training data is required. We demonstrated a pipeline to detect humans, estimate their pose, track them over time and recognize their actions in real-time with standard monocular camera sensors. For action recognition, we encode the human pose into a new data format called Encoded Human Pose Image (EHPI) that can then be classified using standard methods from the computer vision community. With this simple procedure we achieve competitive state-of-the-art performance in pose-based action detection and can ensure real-time performance. In addition, we show a use case in the context of autonomous driving to demonstrate how such a system can be trained to recognize human actions using simulation data.
Tasks Action Detection, Autonomous Driving, Skeleton Based Action Recognition, Temporal Action Localization
Published 2019-04-19
URL http://arxiv.org/abs/1904.09140v1
PDF http://arxiv.org/pdf/1904.09140v1.pdf
PWC https://paperswithcode.com/paper/simple-yet-efficient-real-time-pose-based
Repo https://github.com/noboevbo/ehpi_action_recognition
Framework pytorch

Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German

Title Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German
Authors Lucy Linder, Michael Jungo, Jean Hennebert, Claudiu Musat, Andreas Fischer
Abstract This paper presents SwissCrawl, the largest Swiss German text corpus to date. Composed of more than half a million sentences, it was generated using a customized web scraping tool that could be applied to other low-resource languages as well. The approach demonstrates how freely available web pages can be used to construct comprehensive text corpora, which are of fundamental importance for natural language processing. In an experimental evaluation, we show that using the new corpus leads to significant improvements for the task of language modeling. To capture new content, our approach will run continuously to keep increasing the corpus over time.
Tasks Language Modelling
Published 2019-11-30
URL https://arxiv.org/abs/1912.00159v2
PDF https://arxiv.org/pdf/1912.00159v2.pdf
PWC https://paperswithcode.com/paper/automatic-creation-of-text-corpora-for-low
Repo https://github.com/derlin/swisstext
Framework none

Unsupervised Dialog Structure Learning

Title Unsupervised Dialog Structure Learning
Authors Weiyan Shi, Tiancheng Zhao, Zhou Yu
Abstract Learning a shared dialog structure from a set of task-oriented dialogs is an important challenge in computational linguistics. The learned dialog structure can shed light on how to analyze human dialogs, and more importantly contribute to the design and evaluation of dialog systems. We propose to extract dialog structures using a modified VRNN model with discrete latent vectors. Different from existing HMM-based models, our model is based on variational-autoencoder (VAE). Such model is able to capture more dynamics in dialogs beyond the surface forms of the language. We find that qualitatively, our method extracts meaningful dialog structure, and quantitatively, outperforms previous models on the ability to predict unseen data. We further evaluate the model’s effectiveness in a downstream task, the dialog system building task. Experiments show that, by integrating the learned dialog structure into the reward function design, the model converges faster and to a better outcome in a reinforcement learning setting.
Tasks
Published 2019-04-07
URL https://arxiv.org/abs/1904.03736v2
PDF https://arxiv.org/pdf/1904.03736v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-dialog-structure-learning
Repo https://github.com/wyshi/Unsupervised-Structure-Learning
Framework tf

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

Title EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
Authors He guo, Xiameng Qin, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding
Abstract Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts. Most of the existing works employ classical detection and recognition paradigm. This paper proposes an Entity-aware Attention Text Extraction Network called EATEN, which is an end-to-end trainable system to extract the entities without any post-processing. In the proposed framework, each entity is parsed by its corresponding entity-aware decoder, respectively. Moreover, we innovatively introduce a state transition mechanism which further improves the robustness of entity extraction. In consideration of the absence of public benchmarks, we construct a dataset of almost 0.6 million images in three real-world scenarios (train ticket, passport and business card), which is publicly available at https://github.com/beacandler/EATEN. To the best of our knowledge, EATEN is the first single shot method to extract entities from images. Extensive experiments on these benchmarks demonstrate the state-of-the-art performance of EATEN.
Tasks Entity Extraction, Optical Character Recognition
Published 2019-09-20
URL https://arxiv.org/abs/1909.09380v1
PDF https://arxiv.org/pdf/1909.09380v1.pdf
PWC https://paperswithcode.com/paper/eaten-entity-aware-attention-for-single-shot
Repo https://github.com/beacandler/EATEN
Framework none

Global Aggregation then Local Distribution in Fully Convolutional Networks

Title Global Aggregation then Local Distribution in Fully Convolutional Networks
Authors Xiangtai Li, Li Zhang, Ansheng You, Maoke Yang, Kuiyuan Yang, Yunhai Tong
Abstract It has been widely proven that modelling long-range dependencies in fully convolutional networks (FCNs) via global aggregation modules is critical for complex scene understanding tasks such as semantic segmentation and object detection. However, global aggregation is often dominated by features of large patterns and tends to oversmooth regions that contain small patterns (e.g., boundaries and small objects). To resolve this problem, we propose to first use \emph{Global Aggregation} and then \emph{Local Distribution}, which is called GALD, where long-range dependencies are more confidently used inside large pattern regions and vice versa. The size of each pattern at each position is estimated in the network as a per-channel mask map. GALD is end-to-end trainable and can be easily plugged into existing FCNs with various global aggregation modules for a wide range of vision tasks, and consistently improves the performance of state-of-the-art object detection and instance segmentation approaches. In particular, GALD used in semantic segmentation achieves new state-of-the-art performance on Cityscapes test set with mIoU 83.3%. Code is available at: \url{https://github.com/lxtGH/GALD-Net}
Tasks Instance Segmentation, Object Detection, Scene Understanding, Semantic Segmentation
Published 2019-09-16
URL https://arxiv.org/abs/1909.07229v1
PDF https://arxiv.org/pdf/1909.07229v1.pdf
PWC https://paperswithcode.com/paper/global-aggregation-then-local-distribution-in
Repo https://github.com/lxtGH/GALD-Net
Framework pytorch

Generator evaluator-selector net: a modular approach for panoptic segmentation

Title Generator evaluator-selector net: a modular approach for panoptic segmentation
Authors Sagi Eppel, Alan Aspuru-Guzik
Abstract In machine learning and other fields, suggesting a good solution to a problem is usually a harder task than evaluating the quality of such a solution. This asymmetry is the basis for a large number of selection oriented methods that use a generator system to guess a set of solutions and an evaluator system to rank and select the best solutions. This work examines the use of this approach to the problem of image segmentation. The generator/evaluator approach for this case consists of two independent convolutional neural nets: a generator net that suggests variety segments corresponding to objects and distinct regions in the image and an evaluator net that chooses the best segments to be merged into the segmentation map. The result is a trial and error evolutionary approach in which a generator that guesses segments with low average accuracy, but with wide variability, can still produce good results when coupled with an accurate evaluator. Generating and evaluating each segment separately is essential in this case since it demands exponentially fewer guesses compared to a system that guesses and evaluates the full segmentation map in each try. Another form of modularity used in this system is separating the segmentation and classification into independent neural nets. This allows the segmentation to be class agnostic and hence capable of segmenting unfamiliar categories that were not part of the training set. The method was examined on the COCO Panoptic segmentation benchmark and gave competitive results to the standard semantic segmentation and instance segmentation methods.
Tasks Instance Segmentation, Panoptic Segmentation, Semantic Segmentation
Published 2019-08-24
URL https://arxiv.org/abs/1908.09108v2
PDF https://arxiv.org/pdf/1908.09108v2.pdf
PWC https://paperswithcode.com/paper/generator-evaluator-selector-net-a-modular
Repo https://github.com/sagieppel/Generator-evaluator-selector-net-a-modular-approach-for-panoptic-segmentation
Framework pytorch

InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting

Title InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting
Authors Hao-Shu Fang, Jianhua Sun, Runzhong Wang, Minghao Gou, Yong-Lu Li, Cewu Lu
Abstract Instance segmentation requires a large number of training samples to achieve satisfactory performance and benefits from proper data augmentation. To enlarge the training set and increase the diversity, previous methods have investigated using data annotation from other domain (e.g. bbox, point) in a weakly supervised mechanism. In this paper, we present a simple, efficient and effective method to augment the training set using the existing instance mask annotations. Exploiting the pixel redundancy of the background, we are able to improve the performance of Mask R-CNN for 1.7 mAP on COCO dataset and 3.3 mAP on Pascal VOC dataset by simply introducing random jittering to objects. Furthermore, we propose a location probability map based approach to explore the feasible locations that objects can be placed based on local appearance similarity. With the guidance of such map, we boost the performance of R101-Mask R-CNN on instance segmentation from 35.7 mAP to 37.9 mAP without modifying the backbone or network structure. Our method is simple to implement and does not increase the computational complexity. It can be integrated into the training pipeline of any instance segmentation model without affecting the training and inference efficiency. Our code and models have been released at https://github.com/GothicAi/InstaBoost
Tasks Data Augmentation, Instance Segmentation, Semantic Segmentation
Published 2019-08-21
URL https://arxiv.org/abs/1908.07801v1
PDF https://arxiv.org/pdf/1908.07801v1.pdf
PWC https://paperswithcode.com/paper/instaboost-boosting-instance-segmentation-via
Repo https://github.com/GothicAi/InstaBoost
Framework pytorch

MTab: Matching Tabular Data to Knowledge Graph using Probability Models

Title MTab: Matching Tabular Data to Knowledge Graph using Probability Models
Authors Phuc Nguyen, Natthawut Kertkeidkachorn, Ryutaro Ichise, Hideaki Takeda
Abstract This paper presents the design of our system, namely MTab, for Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019). MTab combines the voting algorithm and the probability models to solve critical problems of the matching tasks. Results on SemTab 2019 show that MTab obtains promising performance for the three matching tasks.
Tasks Graph Matching
Published 2019-10-01
URL https://arxiv.org/abs/1910.00246v2
PDF https://arxiv.org/pdf/1910.00246v2.pdf
PWC https://paperswithcode.com/paper/mtab-matching-tabular-data-to-knowledge-graph
Repo https://github.com/phucty/MTab
Framework none

GAN Based Image Deblurring Using Dark Channel Prior

Title GAN Based Image Deblurring Using Dark Channel Prior
Authors Shuang Zhang, Ada Zhen, Robert L. Stevenson
Abstract A conditional general adversarial network (GAN) is proposed for image deblurring problem. It is tailored for image deblurring instead of just applying GAN on the deblurring problem. Motivated by that, dark channel prior is carefully picked to be incorporated into the loss function for network training. To make it more compatible with neuron networks, its original indifferentiable form is discarded and L2 norm is adopted instead. On both synthetic datasets and noisy natural images, the proposed network shows improved deblurring performance and robustness to image noise qualitatively and quantitatively. Additionally, compared to the existing end-to-end deblurring networks, our network structure is light-weight, which ensures less training and testing time.
Tasks Deblurring
Published 2019-02-28
URL http://arxiv.org/abs/1903.00107v1
PDF http://arxiv.org/pdf/1903.00107v1.pdf
PWC https://paperswithcode.com/paper/gan-based-image-deblurring-using-dark-channel
Repo https://github.com/tanyakatiyar/image-deblur
Framework none

Direct Feedback Alignment with Sparse Connections for Local Learning

Title Direct Feedback Alignment with Sparse Connections for Local Learning
Authors Brian Crafton, Abhinav Parihar, Evan Gebhardt, Arijit Raychowdhury
Abstract Recent advances in deep neural networks (DNNs) owe their success to training algorithms that use backpropagation and gradient-descent. Backpropagation, while highly effective on von Neumann architectures, becomes inefficient when scaling to large networks. Commonly referred to as the weight transport problem, each neuron’s dependence on the weights and errors located deeper in the network require exhaustive data movement which presents a key problem in enhancing the performance and energy-efficiency of machine-learning hardware. In this work, we propose a bio-plausible alternative to backpropagation drawing from advances in feedback alignment algorithms in which the error computation at a single synapse reduces to the product of three scalar values. Using a sparse feedback matrix, we show that a neuron needs only a fraction of the information previously used by the feedback alignment algorithms. Consequently, memory and compute can be partitioned and distributed whichever way produces the most efficient forward pass so long as a single error can be delivered to each neuron. Our results show orders of magnitude improvement in data movement and $2\times$ improvement in multiply-and-accumulate operations over backpropagation. Like previous work, we observe that any variant of feedback alignment suffers significant losses in classification accuracy on deep convolutional neural networks. By transferring trained convolutional layers and training the fully connected layers using direct feedback alignment, we demonstrate that direct feedback alignment can obtain results competitive with backpropagation. Furthermore, we observe that using an extremely sparse feedback matrix, rather than a dense one, results in a small accuracy drop while yielding hardware advantages. All the code and results are available under https://github.com/bcrafton/ssdfa.
Tasks
Published 2019-01-30
URL https://arxiv.org/abs/1903.02083v2
PDF https://arxiv.org/pdf/1903.02083v2.pdf
PWC https://paperswithcode.com/paper/direct-feedback-alignment-with-sparse
Repo https://github.com/bcrafton/ssdfa
Framework tf

Towards better Validity: Dispersion based Clustering for Unsupervised Person Re-identification

Title Towards better Validity: Dispersion based Clustering for Unsupervised Person Re-identification
Authors Guodong Ding, Salman Khan, Zhenmin Tang, Jian Zhang, Fatih Porikli
Abstract Person re-identification aims to establish the correct identity correspondences of a person moving through a non-overlapping multi-camera installation. Recent advances based on deep learning models for this task mainly focus on supervised learning scenarios where accurate annotations are assumed to be available for each setup. Annotating large scale datasets for person re-identification is demanding and burdensome, which renders the deployment of such supervised approaches to real-world applications infeasible. Therefore, it is necessary to train models without explicit supervision in an autonomous manner. In this paper, we propose an elegant and practical clustering approach for unsupervised person re-identification based on the cluster validity consideration. Concretely, we explore a fundamental concept in statistics, namely \emph{dispersion}, to achieve a robust clustering criterion. Dispersion reflects the compactness of a cluster when employed at the intra-cluster level and reveals the separation when measured at the inter-cluster level. With this insight, we design a novel Dispersion-based Clustering (DBC) approach which can discover the underlying patterns in data. This approach considers a wider context of sample-level pairwise relationships to achieve a robust cluster affinity assessment which handles the complications may arise due to prevalent imbalanced data distributions. Additionally, our solution can automatically prioritize standalone data points and prevents inferior clustering. Our extensive experimental analysis on image and video re-identification benchmarks demonstrate that our method outperforms the state-of-the-art unsupervised methods by a significant margin. Code is available at https://github.com/gddingcs/Dispersion-based-Clustering.git.
Tasks Person Re-Identification, Unsupervised Person Re-Identification
Published 2019-06-04
URL https://arxiv.org/abs/1906.01308v1
PDF https://arxiv.org/pdf/1906.01308v1.pdf
PWC https://paperswithcode.com/paper/towards-better-validity-dispersion-based
Repo https://github.com/gddingcs/Dispersion-based-Clustering
Framework pytorch

GraphAIR: Graph Representation Learning with Neighborhood Aggregation and Interaction

Title GraphAIR: Graph Representation Learning with Neighborhood Aggregation and Interaction
Authors Fenyu Hu, Yanqiao Zhu, Shu Wu, Weiran Huang, Liang Wang, Tieniu Tan
Abstract Graph representation learning is of paramount importance for a variety of graph analytical tasks, ranging from node classification to community detection. Recently, graph convolutional networks (GCNs) have been successfully applied for graph representation learning. These GCNs generate node representation by aggregating features from the neighborhoods, which follows the “neighborhood aggregation” scheme. In spite of having achieved promising performance on various tasks, existing GCN-based models have difficulty in well capturing complicated non-linearity of graph data. In this paper, we first theoretically prove that coefficients of the neighborhood interacting terms are relatively small in current models, which explains why GCNs barely outperforms linear models. Then, in order to better capture the complicated non-linearity of graph data, we present a novel GraphAIR framework which models the neighborhood interaction in addition to neighborhood aggregation. Comprehensive experiments conducted on benchmark tasks including node classification and link prediction using public datasets demonstrate the effectiveness of the proposed method over the state-of-the-art methods.
Tasks Community Detection, Graph Representation Learning, Link Prediction, Node Classification, Representation Learning
Published 2019-11-05
URL https://arxiv.org/abs/1911.01731v2
PDF https://arxiv.org/pdf/1911.01731v2.pdf
PWC https://paperswithcode.com/paper/graphair-graph-representation-learning-with
Repo https://github.com/CRIPAC-DIG/GraphAIR
Framework tf

MILDNet: A Lightweight Single Scaled Deep Ranking Architecture

Title MILDNet: A Lightweight Single Scaled Deep Ranking Architecture
Authors Anirudha Vishvakarma
Abstract Multi-scale deep CNN architecture [1, 2, 3] successfully captures both fine and coarse level image descriptors for visual similarity task, but they come up with expensive memory overhead and latency. In this paper, we propose a competing novel CNN architecture, called MILDNet, which merits by being vastly compact (about 3 times). Inspired by the fact that successive CNN layers represent the image with increasing levels of abstraction, we compressed our deep ranking model to a single CNN by coupling activations from multiple intermediate layers along with the last layer. Trained on the famous Street2shop dataset [4], we demonstrate that our approach performs as good as the current state-of-the-art models with only one third of the parameters, model size, training time and significant reduction in inference time. The significance of intermediate layers on image retrieval task has also been shown to be performing on popular datasets Holidays, Oxford, Paris [5]. So even though our experiments are done on ecommerce domain, it is applicable to other domains as well. We further did an ablation study to validate our hypothesis by checking the impact on adding each intermediate layer. With this we also present two more useful variants of MILDNet, a mobile model (12 times smaller) for on-edge devices and a compactly featured model (512-d feature embeddings) for systems with less RAMs and to reduce the ranking cost. Further we present an intuitive way to automatically create a tailored in-house triplet training dataset, which is very hard to create manually. This solution too can also be deployed as an all-inclusive visual similarity solution. Finally, we present our entire production level architecture which currently powers visual similarity at Fynd.
Tasks Fine-Grained Visual Recognition, Image Retrieval, Product Recommendation, Recommendation Systems
Published 2019-03-03
URL http://arxiv.org/abs/1903.00905v2
PDF http://arxiv.org/pdf/1903.00905v2.pdf
PWC https://paperswithcode.com/paper/mildnet-a-lightweight-single-scaled-deep
Repo https://github.com/gofynd/mildnet
Framework tf

Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

Title Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
Authors Vincent Sitzmann, Michael Zollhöfer, Gordon Wetzstein
Abstract Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. While geometric deep learning has explored 3D-structure-aware representations of scene geometry, these models typically require explicit 3D supervision. Emerging neural scene representations can be trained only with posed 2D images, but existing methods ignore the three-dimensional structure of scenes. We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from only 2D images and their camera poses, without access to depth or shape. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. We demonstrate the potential of SRNs by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
Tasks Novel View Synthesis
Published 2019-06-04
URL https://arxiv.org/abs/1906.01618v2
PDF https://arxiv.org/pdf/1906.01618v2.pdf
PWC https://paperswithcode.com/paper/scene-representation-networks-continuous-3d
Repo https://github.com/vsitzmann/scene-representation-networks
Framework pytorch
comments powered by Disqus