January 31, 2020

3353 words 16 mins read

Paper Group AWR 431

d-blink: Distributed End-to-End Bayesian Entity Resolution. PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification. QASC: A Dataset for Question Answering via Sentence Composition. Dressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks. Optimal Transport-based Alignment of Learned Character Re …

d-blink: Distributed End-to-End Bayesian Entity Resolution


Title	d-blink: Distributed End-to-End Bayesian Entity Resolution
Authors	Neil G. Marchant, Rebecca C. Steorts, Andee Kaplan, Benjamin I. P. Rubinstein, Daniel N. Elazar
Abstract	Entity resolution (ER) (record linkage or de-duplication) is the process of merging together noisy databases, often in the absence of a unique identifier. A major advancement in ER methodology has been the application of Bayesian generative models. Such models provide a natural framework for clustering records to unobserved (latent) entities, while providing exact uncertainty quantification and tight performance bounds. Despite these advancements, existing models do not scale to realistically-sized databases (larger than 1000 records) and they do not incorporate probabilistic blocking. In this paper, we propose “distributed Bayesian linkage” or d-blink – the first scalable and distributed end-to-end Bayesian model for ER, which propagates uncertainty in blocking, matching and merging. We make several novel contributions, including: (i) incorporating probabilistic blocking directly into the model through auxiliary partitions; (ii) support for missing values; (iii) a partially-collapsed Gibbs sampler; and (iv) a novel perturbation sampling algorithm (leveraging the Vose-Alias method) that enables fast updates of the entity attributes. Finally, we conduct experiments on five data sets which show that d-blink can achieve significant efficiency gains – in excess of 300$\times$ – when compared to existing non-distributed methods.
Tasks	Entity Resolution
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06039v1
PDF	https://arxiv.org/pdf/1909.06039v1.pdf
PWC	https://paperswithcode.com/paper/d-blink-distributed-end-to-end-bayesian
Repo	https://github.com/ngmarchant/dblink-experiments
Framework	none

PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification


Title	PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification
Authors	Yinfei Yang, Yuan Zhang, Chris Tar, Jason Baldridge
Abstract	Most existing work on adversarial data generation focuses on English. For example, PAWS (Paraphrase Adversaries from Word Scrambling) consists of challenging English paraphrase identification pairs from Wikipedia and Quora. We remedy this gap with PAWS-X, a new dataset of 23,659 human translated PAWS evaluation pairs in six typologically distinct languages: French, Spanish, German, Chinese, Japanese, and Korean. We provide baseline numbers for three models with different capacity to capture non-local context and sentence structure, and using different multilingual training and evaluation regimes. Multilingual BERT fine-tuned on PAWS English plus machine-translated data performs the best, with a range of 83.1-90.8 accuracy across the non-English languages and an average accuracy gain of 23% over the next best model. PAWS-X shows the effectiveness of deep, multilingual pre-training while also leaving considerable headroom as a new challenge to drive multilingual research that better captures structure and contextual information.
Tasks	Paraphrase Identification
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11828v1
PDF	https://arxiv.org/pdf/1908.11828v1.pdf
PWC	https://paperswithcode.com/paper/paws-x-a-cross-lingual-adversarial-dataset
Repo	https://github.com/google-research-datasets/paws
Framework	none

QASC: A Dataset for Question Answering via Sentence Composition


Title	QASC: A Dataset for Question Answering via Sentence Composition
Authors	Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen, Ashish Sabharwal
Abstract	Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition(QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question. QASC is the first dataset to offer two desirable properties: (a) the facts to be composed are annotated in a large corpus, and (b) the decomposition into these facts is not evident from the question itself. The latter makes retrieval challenging as the system must introduce new concepts or relations in order to discover potential decompositions. Further, the reasoning model must then learn to identify valid compositions of these retrieved facts using common-sense reasoning. To help address these challenges, we provide annotation for supporting facts as well as their composition. Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges. We use other multiple-choice datasets as additional training data to strengthen the reasoning model. Our proposed approach improves over current state-of-the-art language models by 11% (absolute). The reasoning and retrieval problems, however, remain unsolved as this model still lags by 20% behind human performance.
Tasks	Common Sense Reasoning, Question Answering
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11473v2
PDF	https://arxiv.org/pdf/1910.11473v2.pdf
PWC	https://paperswithcode.com/paper/qasc-a-dataset-for-question-answering-via
Repo	https://github.com/allenai/qasc
Framework	none

Dressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks


Title	Dressing as a Whole: Outfit Compatibility Learning Based on Node-wise Graph Neural Networks
Authors	Zeyu Cui, Zekun Li, Shu Wu, Xiaoyu Zhang, Liang Wang
Abstract	With the rapid development of fashion market, the customers’ demands of customers for fashion recommendation are rising. In this paper, we aim to investigate a practical problem of fashion recommendation by answering the question “which item should we select to match with the given fashion items and form a compatible outfit”. The key to this problem is to estimate the outfit compatibility. Previous works which focus on the compatibility of two items or represent an outfit as a sequence fail to make full use of the complex relations among items in an outfit. To remedy this, we propose to represent an outfit as a graph. In particular, we construct a Fashion Graph, where each node represents a category and each edge represents interaction between two categories. Accordingly, each outfit can be represented as a subgraph by putting items into their corresponding category nodes. To infer the outfit compatibility from such a graph, we propose Node-wise Graph Neural Networks (NGNN) which can better model node interactions and learn better node representations. In NGNN, the node interaction on each edge is different, which is determined by parameters correlated to the two connected nodes. An attention mechanism is utilized to calculate the outfit compatibility score with learned node representations. NGNN can not only be used to model outfit compatibility from visual or textual modality but also from multiple modalities. We conduct experiments on two tasks: (1) Fill-in-the-blank: suggesting an item that matches with existing components of outfit; (2) Compatibility prediction: predicting the compatibility scores of given outfits. Experimental results demonstrate the great superiority of our proposed method over others.
Tasks	Recommendation Systems
Published	2019-02-21
URL	http://arxiv.org/abs/1902.08009v1
PDF	http://arxiv.org/pdf/1902.08009v1.pdf
PWC	https://paperswithcode.com/paper/dressing-as-a-whole-outfit-compatibility
Repo	https://github.com/CRIPAC-DIG/NGNN
Framework	tf

Optimal Transport-based Alignment of Learned Character Representations for String Similarity


Title	Optimal Transport-based Alignment of Learned Character Representations for String Similarity
Authors	Derek Tam, Nicholas Monath, Ari Kobren, Aaron Traylor, Rajarshi Das, Andrew McCallum
Abstract	String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE –a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE’s ability to detect whether two strings can refer to the same entity–a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE or one of its variants outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE’s ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in B^3 F1 over the previous state-of-the-art approach.
Tasks	Entity Resolution
Published	2019-07-23
URL	https://arxiv.org/abs/1907.10165v1
PDF	https://arxiv.org/pdf/1907.10165v1.pdf
PWC	https://paperswithcode.com/paper/optimal-transport-based-alignment-of-learned
Repo	https://github.com/iesl/stance
Framework	pytorch

LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking


Title	LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
Authors	Guanghan Ning, Heng Huang
Abstract	In this paper, we propose a novel effective light-weight framework, called LightTrack, for online human pose tracking. The proposed framework is designed to be generic for top-down pose tracking and is faster than existing online and offline methods. Single-person Pose Tracking (SPT) and Visual Object Tracking (VOT) are incorporated into one unified functioning entity, easily implemented by a replaceable single-person pose estimation module. Our framework unifies single-person pose tracking with multi-person identity association and sheds first light upon bridging keypoint tracking with object tracking. We also propose a Siamese Graph Convolution Network (SGCN) for human pose matching as a Re-ID module in our pose tracking system. In contrary to other Re-ID modules, we use a graphical representation of human joints for matching. The skeleton-based representation effectively captures human pose similarity and is computationally inexpensive. It is robust to sudden camera shift that introduces human drifting. To the best of our knowledge, this is the first paper to propose an online human pose tracking framework in a top-down fashion. The proposed framework is general enough to fit other pose estimators and candidate matching mechanisms. Our method outperforms other online methods while maintaining a much higher frame rate, and is very competitive with our offline state-of-the-art. We make the code publicly available at: https://github.com/Guanghan/lighttrack.
Tasks	Object Tracking, Pose Estimation, Pose Tracking, Visual Object Tracking
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02822v1
PDF	https://arxiv.org/pdf/1905.02822v1.pdf
PWC	https://paperswithcode.com/paper/lighttrack-a-generic-framework-for-online-top
Repo	https://github.com/Guanghan/lighttrack
Framework	tf

Multi-graph Fusion for Multi-view Spectral Clustering


Title	Multi-graph Fusion for Multi-view Spectral Clustering
Authors	Zhao Kang, Guoxin Shi, Shudong Huang, Wenyu Chen, Xiaorong Pu, Joey Tianyi Zhou, Zenglin Xu
Abstract	A panoply of multi-view clustering algorithms has been developed to deal with prevalent multi-view data. Among them, spectral clustering-based methods have drawn much attention and demonstrated promising results recently. Despite progress, there are still two fundamental questions that stay unanswered to date. First, how to fuse different views into one graph. More often than not, the similarities between samples may be manifested differently by different views. Many existing algorithms either simply take the average of multiple views or just learn a common graph. These simple approaches fail to consider the flexible local manifold structures of all views. Hence, the rich heterogeneous information is not fully exploited. Second, how to learn the explicit cluster structure. Most existing methods don’t pay attention to the quality of the graphs and perform graph learning and spectral clustering separately. Those unreliable graphs might lead to suboptimal clustering results. To fill these gaps, in this paper, we propose a novel multi-view spectral clustering model which performs graph fusion and spectral clustering simultaneously. The fusion graph approximates the original graph of each individual view but maintains an explicit cluster structure. Experiments on four widely used data sets confirm the superiority of the proposed method.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.06940v1
PDF	https://arxiv.org/pdf/1909.06940v1.pdf
PWC	https://paperswithcode.com/paper/multi-graph-fusion-for-multi-view-spectral
Repo	https://github.com/sckangz/GFSC
Framework	none

Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics using Probabilistic Modeling


Title	Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics using Probabilistic Modeling
Authors	Ramtin Hosseini, Neda Hassanpour, Li-Ping Liu, Soha Hassoun
Abstract	Motivation: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. Results: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures measurements and known information about the sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives a probabilistic annotation, which assigns chemical identities to the measurements. PUMA is validated on synthetic datasets. When applied to test cases, the resulting pathway activities are biologically meaningful and distinctly different from those obtained using statistical pathway enrichment techniques. Annotation results are in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many additional measurements.
Tasks
Published	2019-12-12
URL	https://arxiv.org/abs/1912.05753v2
PDF	https://arxiv.org/pdf/1912.05753v2.pdf
PWC	https://paperswithcode.com/paper/pathway-activity-analysis-and-metabolite
Repo	https://github.com/HassounLab/PUMA
Framework	none

Unsupervised Learning from Video with Deep Neural Embeddings


Title	Unsupervised Learning from Video with Deep Neural Embeddings
Authors	Chengxu Zhuang, Tianwei She, Alex Andonian, Max Sobol Mark, Daniel Yamins
Abstract	Because of the rich dynamical structure of videos and their ubiquity in everyday life, it is a natural idea that video data could serve as a powerful unsupervised learning signal for training visual representations in deep neural networks. However, instantiating this idea, especially at large scale, has remained a significant artificial intelligence challenge. Here we present the Video Instance Embedding (VIE) framework, which extends powerful recent unsupervised loss functions for learning deep nonlinear embeddings to multi-stream temporal processing architectures on large-scale video datasets. We show that VIE-trained networks substantially advance the state of the art in unsupervised learning from video datastreams, both for action recognition in the Kinetics dataset, and object recognition in the ImageNet dataset. We show that a hybrid model with both static and dynamic processing pathways is optimal for both transfer tasks, and provide analyses indicating how the pathways differ. Taken in context, our results suggest that deep neural embeddings are a promising approach to unsupervised visual learning across a wide variety of domains.
Tasks	Object Recognition
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11954v2
PDF	https://arxiv.org/pdf/1905.11954v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-from-video-with-deep
Repo	https://github.com/Chrisackerman1/Unsupervised-Learning-from-Video-with-Deep-Neural-Embeddings
Framework	none

Principled Training of Neural Networks with Direct Feedback Alignment


Title	Principled Training of Neural Networks with Direct Feedback Alignment
Authors	Julien Launay, Iacopo Poli, Florent Krzakala
Abstract	The backpropagation algorithm has long been the canonical training method for neural networks. Modern paradigms are implicitly optimized for it, and numerous guidelines exist to ensure its proper use. Recently, synthetic gradients methods -where the error gradient is only roughly approximated - have garnered interest. These methods not only better portray how biological brains are learning, but also open new computational possibilities, such as updating layers asynchronously. Even so, they have failed to scale past simple tasks like MNIST or CIFAR-10. This is in part due to a lack of standards, leading to ill-suited models and practices forbidding such methods from performing to the best of their abilities. In this work, we focus on direct feedback alignment and present a set of best practices justified by observations of the alignment angles. We characterize a bottleneck effect that prevents alignment in narrow layers, and hypothesize it may explain why feedback alignment methods have yet to scale to large convolutional networks.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04554v1
PDF	https://arxiv.org/pdf/1906.04554v1.pdf
PWC	https://paperswithcode.com/paper/principled-training-of-neural-networks-with
Repo	https://github.com/lightonai/principled-dfa-training
Framework	pytorch

Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines


Title	Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines
Authors	Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, Abhishek Kar
Abstract	We present a practical and robust deep learning solution for capturing and rendering novel views of complex real world scenes for virtual exploration. Previous approaches either require intractably dense view sampling or provide little to no guidance for how users should sample views of a scene to reliably render high-quality novel views. Instead, we propose an algorithm for view synthesis from an irregular grid of sampled views that first expands each sampled view into a local light field via a multiplane image (MPI) scene representation, then renders novel views by blending adjacent local light fields. We extend traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using our algorithm. In practice, we apply this bound to capture and render views of real world scenes that achieve the perceptual quality of Nyquist rate view sampling while using up to 4000x fewer views. We demonstrate our approach’s practicality with an augmented reality smartphone app that guides users to capture input images of a scene and viewers that enable realtime virtual exploration on desktop and mobile platforms.
Tasks	Novel View Synthesis
Published	2019-05-02
URL	https://arxiv.org/abs/1905.00889v1
PDF	https://arxiv.org/pdf/1905.00889v1.pdf
PWC	https://paperswithcode.com/paper/local-light-field-fusion-practical-view
Repo	https://github.com/Fyusion/LLFF
Framework	tf

ManiGAN: Text-Guided Image Manipulation


Title	ManiGAN: Text-Guided Image Manipulation
Authors	Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr
Abstract	The goal of our paper is to semantically edit parts of an image matching a given text that describes desired attributes (e.g., texture, colour, and background), while preserving other contents that are irrelevant to the text. To achieve this, we propose a novel generative adversarial network (ManiGAN), which contains two key components: text-image affine combination module (ACM) and detail correction module (DCM). The ACM selects image regions relevant to the given text and then correlates the regions with corresponding semantic words for effective manipulation. Meanwhile, it encodes original image features to help reconstruct text-irrelevant contents. The DCM rectifies mismatched attributes and completes missing contents of the synthetic image. Finally, we suggest a new metric for evaluating image manipulation results, in terms of both the generation of new attributes and the reconstruction of text-irrelevant contents. Extensive experiments on the CUB and COCO datasets demonstrate the superior performance of the proposed method. Code is available at https://github.com/mrlibw/ManiGAN.
Tasks
Published	2019-12-12
URL	https://arxiv.org/abs/1912.06203v2
PDF	https://arxiv.org/pdf/1912.06203v2.pdf
PWC	https://paperswithcode.com/paper/manigan-text-guided-image-manipulation
Repo	https://github.com/mrlibw/ManiGAN
Framework	pytorch

Consistent Community Detection in Continuous-Time Networks of Relational Events


Title	Consistent Community Detection in Continuous-Time Networks of Relational Events
Authors	Makan Arastuie, Subhadeep Paul, Kevin S. Xu
Abstract	In many application settings involving networks, such as messages between users of an on-line social network or transactions between traders in financial markets, the observed data are in the form of relational events with timestamps, which form a continuous-time network. We propose the Community Hawkes Independent Pairs (CHIP) model for community detection on such timestamped relational event data. We demonstrate that applying spectral clustering to adjacency matrices constructed from relational events generated by the CHIP model provides consistent community detection for a growing number of nodes. In particular, we obtain explicit non-asymptotic upper bounds on the misclustering rates based on the separation conditions required on the parameters of the model for consistent community detection. We also develop consistent and computationally efficient estimators for the parameters of the model. We demonstrate that our proposed CHIP model and estimation procedure scales to large networks with tens of thousands of nodes and provides superior fits compared to existing continuous-time network models on several real networks.
Tasks	Community Detection
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06940v1
PDF	https://arxiv.org/pdf/1908.06940v1.pdf
PWC	https://paperswithcode.com/paper/consistent-community-detection-in-continuous
Repo	https://github.com/IdeasLabUT/CHIP-Network-Model
Framework	none

Music-oriented Dance Video Synthesis with Pose Perceptual Loss


Title	Music-oriented Dance Video Synthesis with Pose Perceptual Loss
Authors	Xuanchi Ren, Haoran Li, Zijian Huang, Qifeng Chen
Abstract	We present a learning-based approach with pose perceptual loss for automatic music video generation. Our method can produce a realistic dance video that conforms to the beats and rhymes of almost any given music. To achieve this, we firstly generate a human skeleton sequence from music and then apply the learned pose-to-appearance mapping to generate the final video. In the stage of generating skeleton sequences, we utilize two discriminators to capture different aspects of the sequence and propose a novel pose perceptual loss to produce natural dances. Besides, we also provide a new cross-modal evaluation to evaluate the dance quality, which is able to estimate the similarity between two modalities of music and dance. Finally, a user study is conducted to demonstrate that dance video synthesized by the presented approach produces surprisingly realistic results. The results are shown in the supplementary video at https://youtu.be/0rMuFMZa_K4
Tasks	Video Generation
Published	2019-12-13
URL	https://arxiv.org/abs/1912.06606v1
PDF	https://arxiv.org/pdf/1912.06606v1.pdf
PWC	https://paperswithcode.com/paper/music-oriented-dance-video-synthesis-with
Repo	https://github.com/xrenaa/Music-Dance-Video-Synthesis
Framework	pytorch

Improving Cross-Domain Chinese Word Segmentation with Word Embeddings


Title	Improving Cross-Domain Chinese Word Segmentation with Word Embeddings
Authors	Yuxiao Ye, Yue Zhang, Weikang Li, Likun Qiu, Jian Sun
Abstract	Cross-domain Chinese Word Segmentation (CWS) remains a challenge despite recent progress in neural-based CWS. The limited amount of annotated data in the target domain has been the key obstacle to a satisfactory performance. In this paper, we propose a semi-supervised word-based approach to improving cross-domain CWS given a baseline segmenter. Particularly, our model only deploys word embeddings trained on raw text in the target domain, discarding complex hand-crafted features and domain-specific dictionaries. Innovative subsampling and negative sampling methods are proposed to derive word embeddings optimized for CWS. We conduct experiments on five datasets in special domains, covering domains in novels, medicine, and patent. Results show that our model can obviously improve cross-domain CWS, especially in the segmentation of domain-specific noun entities. The word F-measure increases by over 3.0% on four datasets, outperforming state-of-the-art semi-supervised and unsupervised cross-domain CWS approaches with a large margin. We make our code and data available on Github.
Tasks	Chinese Word Segmentation, Word Embeddings
Published	2019-03-05
URL	http://arxiv.org/abs/1903.01698v3
PDF	http://arxiv.org/pdf/1903.01698v3.pdf
PWC	https://paperswithcode.com/paper/improving-cross-domain-chinese-word
Repo	https://github.com/vatile/CWS-NAACL2019
Framework	tf