January 31, 2020

3294 words 16 mins read

Paper Group AWR 379

#MeTooMA: Multi-Aspect Annotations of Tweets Related to the MeToo Movement. Simple yet efficient real-time pose-based action recognition. Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German. Unsupervised Dialog Structure Learning. EATEN: Entity-aware Attention for Single Shot Visual Text Extract …


Title	#MeTooMA: Multi-Aspect Annotations of Tweets Related to the MeToo Movement
Authors	Akash Gautam, Puneet Mathur, Rakesh Gosangi, Debanjan Mahata, Ramit Sawhney, Rajiv Ratn Shah
Abstract	In this paper, we present a dataset containing 9,973 tweets related to the MeToo movement that were manually annotated for five different linguistic aspects: relevance, stance, hate speech, sarcasm, and dialogue acts. We present a detailed account of the data collection and annotation processes. The annotations have a very high inter-annotator agreement (0.79 to 0.93 k-alpha) due to the domain expertise of the annotators and clear annotation instructions. We analyze the data in terms of geographical distribution, label correlations, and keywords. Lastly, we present some potential use cases of this dataset. We expect this dataset would be of great interest to psycholinguists, socio-linguists, and computational linguists to study the discursive space of digitally mobilized social movements on sensitive issues like sexual harassment.
Tasks
Published	2019-12-14
URL	https://arxiv.org/abs/1912.06927v1
PDF	https://arxiv.org/pdf/1912.06927v1.pdf
PWC	https://paperswithcode.com/paper/metooma-multi-aspect-annotations-of-tweets
Repo	https://github.com/midas-research/MeTooMA
Framework	none

Simple yet efficient real-time pose-based action recognition


Title	Simple yet efficient real-time pose-based action recognition
Authors	Dennis Ludl, Thomas Gulde, Cristóbal Curio
Abstract	Recognizing human actions is a core challenge for autonomous systems as they directly share the same space with humans. Systems must be able to recognize and assess human actions in real-time. In order to train corresponding data-driven algorithms, a significant amount of annotated training data is required. We demonstrated a pipeline to detect humans, estimate their pose, track them over time and recognize their actions in real-time with standard monocular camera sensors. For action recognition, we encode the human pose into a new data format called Encoded Human Pose Image (EHPI) that can then be classified using standard methods from the computer vision community. With this simple procedure we achieve competitive state-of-the-art performance in pose-based action detection and can ensure real-time performance. In addition, we show a use case in the context of autonomous driving to demonstrate how such a system can be trained to recognize human actions using simulation data.
Tasks	Action Detection, Autonomous Driving, Skeleton Based Action Recognition, Temporal Action Localization
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09140v1
PDF	http://arxiv.org/pdf/1904.09140v1.pdf
PWC	https://paperswithcode.com/paper/simple-yet-efficient-real-time-pose-based
Repo	https://github.com/noboevbo/ehpi_action_recognition
Framework	pytorch

Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German


Title	Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German
Authors	Lucy Linder, Michael Jungo, Jean Hennebert, Claudiu Musat, Andreas Fischer
Abstract	This paper presents SwissCrawl, the largest Swiss German text corpus to date. Composed of more than half a million sentences, it was generated using a customized web scraping tool that could be applied to other low-resource languages as well. The approach demonstrates how freely available web pages can be used to construct comprehensive text corpora, which are of fundamental importance for natural language processing. In an experimental evaluation, we show that using the new corpus leads to significant improvements for the task of language modeling. To capture new content, our approach will run continuously to keep increasing the corpus over time.
Tasks	Language Modelling
Published	2019-11-30
URL	https://arxiv.org/abs/1912.00159v2
PDF	https://arxiv.org/pdf/1912.00159v2.pdf
PWC	https://paperswithcode.com/paper/automatic-creation-of-text-corpora-for-low
Repo	https://github.com/derlin/swisstext
Framework	none

Unsupervised Dialog Structure Learning


Title	Unsupervised Dialog Structure Learning
Authors	Weiyan Shi, Tiancheng Zhao, Zhou Yu
Abstract	Learning a shared dialog structure from a set of task-oriented dialogs is an important challenge in computational linguistics. The learned dialog structure can shed light on how to analyze human dialogs, and more importantly contribute to the design and evaluation of dialog systems. We propose to extract dialog structures using a modified VRNN model with discrete latent vectors. Different from existing HMM-based models, our model is based on variational-autoencoder (VAE). Such model is able to capture more dynamics in dialogs beyond the surface forms of the language. We find that qualitatively, our method extracts meaningful dialog structure, and quantitatively, outperforms previous models on the ability to predict unseen data. We further evaluate the model’s effectiveness in a downstream task, the dialog system building task. Experiments show that, by integrating the learned dialog structure into the reward function design, the model converges faster and to a better outcome in a reinforcement learning setting.
Tasks
Published	2019-04-07
URL	https://arxiv.org/abs/1904.03736v2
PDF	https://arxiv.org/pdf/1904.03736v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-dialog-structure-learning
Repo	https://github.com/wyshi/Unsupervised-Structure-Learning
Framework	tf

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction


Title	EATEN: Entity-aware Attention for Single Shot Visual Text Extraction
Authors	He guo, Xiameng Qin, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding
Abstract	Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts. Most of the existing works employ classical detection and recognition paradigm. This paper proposes an Entity-aware Attention Text Extraction Network called EATEN, which is an end-to-end trainable system to extract the entities without any post-processing. In the proposed framework, each entity is parsed by its corresponding entity-aware decoder, respectively. Moreover, we innovatively introduce a state transition mechanism which further improves the robustness of entity extraction. In consideration of the absence of public benchmarks, we construct a dataset of almost 0.6 million images in three real-world scenarios (train ticket, passport and business card), which is publicly available at https://github.com/beacandler/EATEN. To the best of our knowledge, EATEN is the first single shot method to extract entities from images. Extensive experiments on these benchmarks demonstrate the state-of-the-art performance of EATEN.
Tasks	Entity Extraction, Optical Character Recognition
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09380v1
PDF	https://arxiv.org/pdf/1909.09380v1.pdf
PWC	https://paperswithcode.com/paper/eaten-entity-aware-attention-for-single-shot
Repo	https://github.com/beacandler/EATEN
Framework	none

Global Aggregation then Local Distribution in Fully Convolutional Networks


Title	Global Aggregation then Local Distribution in Fully Convolutional Networks
Authors	Xiangtai Li, Li Zhang, Ansheng You, Maoke Yang, Kuiyuan Yang, Yunhai Tong
Abstract	It has been widely proven that modelling long-range dependencies in fully convolutional networks (FCNs) via global aggregation modules is critical for complex scene understanding tasks such as semantic segmentation and object detection. However, global aggregation is often dominated by features of large patterns and tends to oversmooth regions that contain small patterns (e.g., boundaries and small objects). To resolve this problem, we propose to first use \emph{Global Aggregation} and then \emph{Local Distribution}, which is called GALD, where long-range dependencies are more confidently used inside large pattern regions and vice versa. The size of each pattern at each position is estimated in the network as a per-channel mask map. GALD is end-to-end trainable and can be easily plugged into existing FCNs with various global aggregation modules for a wide range of vision tasks, and consistently improves the performance of state-of-the-art object detection and instance segmentation approaches. In particular, GALD used in semantic segmentation achieves new state-of-the-art performance on Cityscapes test set with mIoU 83.3%. Code is available at: \url{https://github.com/lxtGH/GALD-Net}
Tasks	Instance Segmentation, Object Detection, Scene Understanding, Semantic Segmentation
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07229v1
PDF	https://arxiv.org/pdf/1909.07229v1.pdf
PWC	https://paperswithcode.com/paper/global-aggregation-then-local-distribution-in
Repo	https://github.com/lxtGH/GALD-Net
Framework	pytorch

Generator evaluator-selector net: a modular approach for panoptic segmentation


Title	Generator evaluator-selector net: a modular approach for panoptic segmentation
Authors	Sagi Eppel, Alan Aspuru-Guzik
Abstract	In machine learning and other fields, suggesting a good solution to a problem is usually a harder task than evaluating the quality of such a solution. This asymmetry is the basis for a large number of selection oriented methods that use a generator system to guess a set of solutions and an evaluator system to rank and select the best solutions. This work examines the use of this approach to the problem of image segmentation. The generator/evaluator approach for this case consists of two independent convolutional neural nets: a generator net that suggests variety segments corresponding to objects and distinct regions in the image and an evaluator net that chooses the best segments to be merged into the segmentation map. The result is a trial and error evolutionary approach in which a generator that guesses segments with low average accuracy, but with wide variability, can still produce good results when coupled with an accurate evaluator. Generating and evaluating each segment separately is essential in this case since it demands exponentially fewer guesses compared to a system that guesses and evaluates the full segmentation map in each try. Another form of modularity used in this system is separating the segmentation and classification into independent neural nets. This allows the segmentation to be class agnostic and hence capable of segmenting unfamiliar categories that were not part of the training set. The method was examined on the COCO Panoptic segmentation benchmark and gave competitive results to the standard semantic segmentation and instance segmentation methods.
Tasks	Instance Segmentation, Panoptic Segmentation, Semantic Segmentation
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09108v2
PDF	https://arxiv.org/pdf/1908.09108v2.pdf
PWC	https://paperswithcode.com/paper/generator-evaluator-selector-net-a-modular
Repo	https://github.com/sagieppel/Generator-evaluator-selector-net-a-modular-approach-for-panoptic-segmentation
Framework	pytorch

InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting


Title	InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting
Authors	Hao-Shu Fang, Jianhua Sun, Runzhong Wang, Minghao Gou, Yong-Lu Li, Cewu Lu
Abstract	Instance segmentation requires a large number of training samples to achieve satisfactory performance and benefits from proper data augmentation. To enlarge the training set and increase the diversity, previous methods have investigated using data annotation from other domain (e.g. bbox, point) in a weakly supervised mechanism. In this paper, we present a simple, efficient and effective method to augment the training set using the existing instance mask annotations. Exploiting the pixel redundancy of the background, we are able to improve the performance of Mask R-CNN for 1.7 mAP on COCO dataset and 3.3 mAP on Pascal VOC dataset by simply introducing random jittering to objects. Furthermore, we propose a location probability map based approach to explore the feasible locations that objects can be placed based on local appearance similarity. With the guidance of such map, we boost the performance of R101-Mask R-CNN on instance segmentation from 35.7 mAP to 37.9 mAP without modifying the backbone or network structure. Our method is simple to implement and does not increase the computational complexity. It can be integrated into the training pipeline of any instance segmentation model without affecting the training and inference efficiency. Our code and models have been released at https://github.com/GothicAi/InstaBoost
Tasks	Data Augmentation, Instance Segmentation, Semantic Segmentation
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07801v1
PDF	https://arxiv.org/pdf/1908.07801v1.pdf
PWC	https://paperswithcode.com/paper/instaboost-boosting-instance-segmentation-via
Repo	https://github.com/GothicAi/InstaBoost
Framework	pytorch

MTab: Matching Tabular Data to Knowledge Graph using Probability Models


Title	MTab: Matching Tabular Data to Knowledge Graph using Probability Models
Authors	Phuc Nguyen, Natthawut Kertkeidkachorn, Ryutaro Ichise, Hideaki Takeda
Abstract	This paper presents the design of our system, namely MTab, for Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2019). MTab combines the voting algorithm and the probability models to solve critical problems of the matching tasks. Results on SemTab 2019 show that MTab obtains promising performance for the three matching tasks.
Tasks	Graph Matching
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00246v2
PDF	https://arxiv.org/pdf/1910.00246v2.pdf
PWC	https://paperswithcode.com/paper/mtab-matching-tabular-data-to-knowledge-graph
Repo	https://github.com/phucty/MTab
Framework	none

GAN Based Image Deblurring Using Dark Channel Prior


Title	GAN Based Image Deblurring Using Dark Channel Prior
Authors	Shuang Zhang, Ada Zhen, Robert L. Stevenson
Abstract	A conditional general adversarial network (GAN) is proposed for image deblurring problem. It is tailored for image deblurring instead of just applying GAN on the deblurring problem. Motivated by that, dark channel prior is carefully picked to be incorporated into the loss function for network training. To make it more compatible with neuron networks, its original indifferentiable form is discarded and L2 norm is adopted instead. On both synthetic datasets and noisy natural images, the proposed network shows improved deblurring performance and robustness to image noise qualitatively and quantitatively. Additionally, compared to the existing end-to-end deblurring networks, our network structure is light-weight, which ensures less training and testing time.
Tasks	Deblurring
Published	2019-02-28
URL	http://arxiv.org/abs/1903.00107v1
PDF	http://arxiv.org/pdf/1903.00107v1.pdf
PWC	https://paperswithcode.com/paper/gan-based-image-deblurring-using-dark-channel
Repo	https://github.com/tanyakatiyar/image-deblur
Framework	none

Direct Feedback Alignment with Sparse Connections for Local Learning


Title	Direct Feedback Alignment with Sparse Connections for Local Learning
Authors	Brian Crafton, Abhinav Parihar, Evan Gebhardt, Arijit Raychowdhury
Abstract	Recent advances in deep neural networks (DNNs) owe their success to training algorithms that use backpropagation and gradient-descent. Backpropagation, while highly effective on von Neumann architectures, becomes inefficient when scaling to large networks. Commonly referred to as the weight transport problem, each neuron’s dependence on the weights and errors located deeper in the network require exhaustive data movement which presents a key problem in enhancing the performance and energy-efficiency of machine-learning hardware. In this work, we propose a bio-plausible alternative to backpropagation drawing from advances in feedback alignment algorithms in which the error computation at a single synapse reduces to the product of three scalar values. Using a sparse feedback matrix, we show that a neuron needs only a fraction of the information previously used by the feedback alignment algorithms. Consequently, memory and compute can be partitioned and distributed whichever way produces the most efficient forward pass so long as a single error can be delivered to each neuron. Our results show orders of magnitude improvement in data movement and $2\times$ improvement in multiply-and-accumulate operations over backpropagation. Like previous work, we observe that any variant of feedback alignment suffers significant losses in classification accuracy on deep convolutional neural networks. By transferring trained convolutional layers and training the fully connected layers using direct feedback alignment, we demonstrate that direct feedback alignment can obtain results competitive with backpropagation. Furthermore, we observe that using an extremely sparse feedback matrix, rather than a dense one, results in a small accuracy drop while yielding hardware advantages. All the code and results are available under https://github.com/bcrafton/ssdfa.
Tasks
Published	2019-01-30
URL	https://arxiv.org/abs/1903.02083v2
PDF	https://arxiv.org/pdf/1903.02083v2.pdf
PWC	https://paperswithcode.com/paper/direct-feedback-alignment-with-sparse
Repo	https://github.com/bcrafton/ssdfa
Framework	tf

Towards better Validity: Dispersion based Clustering for Unsupervised Person Re-identification


Title	Towards better Validity: Dispersion based Clustering for Unsupervised Person Re-identification
Authors	Guodong Ding, Salman Khan, Zhenmin Tang, Jian Zhang, Fatih Porikli
Abstract	Person re-identification aims to establish the correct identity correspondences of a person moving through a non-overlapping multi-camera installation. Recent advances based on deep learning models for this task mainly focus on supervised learning scenarios where accurate annotations are assumed to be available for each setup. Annotating large scale datasets for person re-identification is demanding and burdensome, which renders the deployment of such supervised approaches to real-world applications infeasible. Therefore, it is necessary to train models without explicit supervision in an autonomous manner. In this paper, we propose an elegant and practical clustering approach for unsupervised person re-identification based on the cluster validity consideration. Concretely, we explore a fundamental concept in statistics, namely \emph{dispersion}, to achieve a robust clustering criterion. Dispersion reflects the compactness of a cluster when employed at the intra-cluster level and reveals the separation when measured at the inter-cluster level. With this insight, we design a novel Dispersion-based Clustering (DBC) approach which can discover the underlying patterns in data. This approach considers a wider context of sample-level pairwise relationships to achieve a robust cluster affinity assessment which handles the complications may arise due to prevalent imbalanced data distributions. Additionally, our solution can automatically prioritize standalone data points and prevents inferior clustering. Our extensive experimental analysis on image and video re-identification benchmarks demonstrate that our method outperforms the state-of-the-art unsupervised methods by a significant margin. Code is available at https://github.com/gddingcs/Dispersion-based-Clustering.git.
Tasks	Person Re-Identification, Unsupervised Person Re-Identification
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01308v1
PDF	https://arxiv.org/pdf/1906.01308v1.pdf
PWC	https://paperswithcode.com/paper/towards-better-validity-dispersion-based
Repo	https://github.com/gddingcs/Dispersion-based-Clustering
Framework	pytorch

GraphAIR: Graph Representation Learning with Neighborhood Aggregation and Interaction


Title	GraphAIR: Graph Representation Learning with Neighborhood Aggregation and Interaction
Authors	Fenyu Hu, Yanqiao Zhu, Shu Wu, Weiran Huang, Liang Wang, Tieniu Tan
Abstract	Graph representation learning is of paramount importance for a variety of graph analytical tasks, ranging from node classification to community detection. Recently, graph convolutional networks (GCNs) have been successfully applied for graph representation learning. These GCNs generate node representation by aggregating features from the neighborhoods, which follows the “neighborhood aggregation” scheme. In spite of having achieved promising performance on various tasks, existing GCN-based models have difficulty in well capturing complicated non-linearity of graph data. In this paper, we first theoretically prove that coefficients of the neighborhood interacting terms are relatively small in current models, which explains why GCNs barely outperforms linear models. Then, in order to better capture the complicated non-linearity of graph data, we present a novel GraphAIR framework which models the neighborhood interaction in addition to neighborhood aggregation. Comprehensive experiments conducted on benchmark tasks including node classification and link prediction using public datasets demonstrate the effectiveness of the proposed method over the state-of-the-art methods.
Tasks	Community Detection, Graph Representation Learning, Link Prediction, Node Classification, Representation Learning
Published	2019-11-05
URL	https://arxiv.org/abs/1911.01731v2
PDF	https://arxiv.org/pdf/1911.01731v2.pdf
PWC	https://paperswithcode.com/paper/graphair-graph-representation-learning-with
Repo	https://github.com/CRIPAC-DIG/GraphAIR
Framework	tf

MILDNet: A Lightweight Single Scaled Deep Ranking Architecture


Title	MILDNet: A Lightweight Single Scaled Deep Ranking Architecture
Authors	Anirudha Vishvakarma
Abstract	Multi-scale deep CNN architecture [1, 2, 3] successfully captures both fine and coarse level image descriptors for visual similarity task, but they come up with expensive memory overhead and latency. In this paper, we propose a competing novel CNN architecture, called MILDNet, which merits by being vastly compact (about 3 times). Inspired by the fact that successive CNN layers represent the image with increasing levels of abstraction, we compressed our deep ranking model to a single CNN by coupling activations from multiple intermediate layers along with the last layer. Trained on the famous Street2shop dataset [4], we demonstrate that our approach performs as good as the current state-of-the-art models with only one third of the parameters, model size, training time and significant reduction in inference time. The significance of intermediate layers on image retrieval task has also been shown to be performing on popular datasets Holidays, Oxford, Paris [5]. So even though our experiments are done on ecommerce domain, it is applicable to other domains as well. We further did an ablation study to validate our hypothesis by checking the impact on adding each intermediate layer. With this we also present two more useful variants of MILDNet, a mobile model (12 times smaller) for on-edge devices and a compactly featured model (512-d feature embeddings) for systems with less RAMs and to reduce the ranking cost. Further we present an intuitive way to automatically create a tailored in-house triplet training dataset, which is very hard to create manually. This solution too can also be deployed as an all-inclusive visual similarity solution. Finally, we present our entire production level architecture which currently powers visual similarity at Fynd.
Tasks	Fine-Grained Visual Recognition, Image Retrieval, Product Recommendation, Recommendation Systems
Published	2019-03-03
URL	http://arxiv.org/abs/1903.00905v2
PDF	http://arxiv.org/pdf/1903.00905v2.pdf
PWC	https://paperswithcode.com/paper/mildnet-a-lightweight-single-scaled-deep
Repo	https://github.com/gofynd/mildnet
Framework	tf

Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations


Title	Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
Authors	Vincent Sitzmann, Michael Zollhöfer, Gordon Wetzstein
Abstract	Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. While geometric deep learning has explored 3D-structure-aware representations of scene geometry, these models typically require explicit 3D supervision. Emerging neural scene representations can be trained only with posed 2D images, but existing methods ignore the three-dimensional structure of scenes. We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from only 2D images and their camera poses, without access to depth or shape. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. We demonstrate the potential of SRNs by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
Tasks	Novel View Synthesis
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01618v2
PDF	https://arxiv.org/pdf/1906.01618v2.pdf
PWC	https://paperswithcode.com/paper/scene-representation-networks-continuous-3d
Repo	https://github.com/vsitzmann/scene-representation-networks
Framework	pytorch