January 31, 2020

2750 words 13 mins read

Paper Group AWR 452

Paper Group AWR 452

Stacked Capsule Autoencoders. ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents. Segmenting the Future. Beyond Personalization: Social Content Recommendation for Creator Equality and Consumer Satisfaction. GradNet: Gradient-Guided Network for Visual Object Tracking. Retrieval-based Localization Based on Domain-invariant …

Stacked Capsule Autoencoders

Title Stacked Capsule Autoencoders
Authors Adam R. Kosiorek, Sara Sabour, Yee Whye Teh, Geoffrey E. Hinton
Abstract Objects are composed of a set of geometrically organized parts. We introduce an unsupervised capsule autoencoder (SCAE), which explicitly uses geometric relationships between parts to reason about objects. Since these relationships do not depend on the viewpoint, our model is robust to viewpoint changes. SCAE consists of two stages. In the first stage, the model predicts presences and poses of part templates directly from the image and tries to reconstruct the image by appropriately arranging the templates. In the second stage, SCAE predicts parameters of a few object capsules, which are then used to reconstruct part poses. Inference in this model is amortized and performed by off-the-shelf neural encoders, unlike in previous capsule networks. We find that object capsule presences are highly informative of the object class, which leads to state-of-the-art results for unsupervised classification on SVHN (55%) and MNIST (98.7%). The code is available at https://github.com/google-research/google-research/tree/master/stacked_capsule_autoencoders
Tasks
Published 2019-06-17
URL https://arxiv.org/abs/1906.06818v2
PDF https://arxiv.org/pdf/1906.06818v2.pdf
PWC https://paperswithcode.com/paper/stacked-capsule-autoencoders
Repo https://github.com/akosiorek/stacked_capsule_autoencoders
Framework tf

ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents

Title ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents
Authors Vincent Christlein, Anguelos Nicolaou, Mathias Seuret, Dominique Stutzmann, Andreas Maier
Abstract This competition investigates the performance of large-scale retrieval of historical document images based on writing style. Based on large image data sets provided by cultural heritage institutions and digital libraries, providing a total of 20 000 document images representing about 10 000 writers, divided in three types: writers of (i) manuscript books, (ii) letters, (iii) charters and legal documents. We focus on the task of automatic image retrieval to simulate common scenarios of humanities research, such as writer retrieval. The most teams submitted traditional methods not using deep learning techniques. The competition results show that a combination of methods is outperforming single methods. Furthermore, letters are much more difficult to retrieve than manuscripts.
Tasks Image Retrieval
Published 2019-12-08
URL https://arxiv.org/abs/1912.03713v1
PDF https://arxiv.org/pdf/1912.03713v1.pdf
PWC https://paperswithcode.com/paper/icdar-2019-competition-on-image-retrieval-for
Repo https://github.com/masyagin1998/robin
Framework tf

Segmenting the Future

Title Segmenting the Future
Authors Hsu-kuang Chiu, Ehsan Adeli, Juan Carlos Niebles
Abstract Predicting the future is an important aspect for decision-making in robotics or autonomous driving systems, which heavily rely upon visual scene understanding. While prior work attempts to predict future video pixels, anticipate activities or forecast future scene semantic segments from segmentation of the preceding frames, methods that predict future semantic segmentation solely from the previous frame RGB data in a single end-to-end trainable model do not exist. In this paper, we propose a temporal encoder-decoder network architecture that encodes RGB frames from the past and decodes the future semantic segmentation. The network is coupled with a new knowledge distillation training framework specific for the forecasting task. Our method, only seeing preceding video frames, implicitly models the scene segments while simultaneously accounting for the object dynamics to infer the future scene semantic segments. Our results on Cityscapes and Apolloscape outperform the baseline and current state-of-the-art methods. Code is available at https://github.com/eddyhkchiu/segmenting_the_future/.
Tasks Autonomous Driving, Decision Making, Scene Understanding, Semantic Segmentation
Published 2019-04-24
URL https://arxiv.org/abs/1904.10666v2
PDF https://arxiv.org/pdf/1904.10666v2.pdf
PWC https://paperswithcode.com/paper/segmenting-the-future
Repo https://github.com/eddyhkchiu/segmenting_the_future
Framework none

Beyond Personalization: Social Content Recommendation for Creator Equality and Consumer Satisfaction

Title Beyond Personalization: Social Content Recommendation for Creator Equality and Consumer Satisfaction
Authors Wenyi Xiao, Huan Zhao, Haojie Pan, Yangqiu Song, Vincent W. Zheng, Qiang Yang
Abstract An effective content recommendation in modern social media platforms should benefit both creators to bring genuine benefits to them and consumers to help them get really interesting content. In this paper, we propose a model called Social Explorative Attention Network (SEAN) for content recommendation. SEAN uses a personalized content recommendation model to encourage personal interests driven recommendation. Moreover, SEAN allows the personalization factors to attend to users’ higher-order friends on the social network to improve the accuracy and diversity of recommendation results. Constructing two datasets from a popular decentralized content distribution platform, Steemit, we compare SEAN with state-of-the-art CF and content based recommendation approaches. Experimental results demonstrate the effectiveness of SEAN in terms of both Gini coefficients for recommendation equality and F1 scores for recommendation performance.
Tasks
Published 2019-05-28
URL https://arxiv.org/abs/1905.11900v3
PDF https://arxiv.org/pdf/1905.11900v3.pdf
PWC https://paperswithcode.com/paper/beyond-personalization-social-content
Repo https://github.com/HKUST-KnowComp/Social-Explorative-Attention-Networks
Framework none

GradNet: Gradient-Guided Network for Visual Object Tracking

Title GradNet: Gradient-Guided Network for Visual Object Tracking
Authors Peixia Li, Boyu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, Huchuan Lu
Abstract The fully-convolutional siamese network based on template matching has shown great potentials in visual tracking. During testing, the template is fixed with the initial target feature and the performance totally relies on the general matching ability of the siamese network. However, this manner cannot capture the temporal variations of targets or background clutter. In this work, we propose a novel gradient-guided network to exploit the discriminative information in gradients and update the template in the siamese network through feed-forward and backward operations. Our algorithm performs feed-forward and backward operations to exploit the discriminative informaiton in gradients and capture the core attention of the target. To be specific, the algorithm can utilize the information from the gradient to update the template in the current frame. In addition, a template generalization training method is proposed to better use gradient information and avoid overfitting. To our knowledge, this work is the first attempt to exploit the information in the gradient for template update in siamese-based trackers. Extensive experiments on recent benchmarks demonstrate that our method achieves better performance than other state-of-the-art trackers.
Tasks Object Tracking, Visual Object Tracking, Visual Tracking
Published 2019-09-15
URL https://arxiv.org/abs/1909.06800v1
PDF https://arxiv.org/pdf/1909.06800v1.pdf
PWC https://paperswithcode.com/paper/gradnet-gradient-guided-network-for-visual
Repo https://github.com/LPXTT/GradNet-Pytorch
Framework tf

Retrieval-based Localization Based on Domain-invariant Feature Learning under Changing Environments

Title Retrieval-based Localization Based on Domain-invariant Feature Learning under Changing Environments
Authors Hanjiang Hu, Hesheng Wang, Zhe Liu, Chenguang Yang, Weidong Chen, Le Xie
Abstract Visual localization is a crucial problem in mobile robotics and autonomous driving. One solution is to retrieve images with known pose from a database for the localization of query images. However, in environments with drastically varying conditions (e.g. illumination changes, seasons, occlusion, dynamic objects), retrieval-based localization is severely hampered and becomes a challenging problem. In this paper, a novel domain-invariant feature learning method (DIFL) is proposed based on ComboGAN, a multi-domain image translation network architecture. By introducing a feature consistency loss (FCL) between the encoded features of the original image and translated image in another domain, we are able to train the encoders to generate domain-invariant features in a self-supervised manner. To retrieve a target image from the database, the query image is first encoded using the encoder belonging to the query domain to obtain a domain-invariant feature vector. We then preform retrieval by selecting the database image with the most similar domain-invariant feature vector. We validate the proposed approach on the CMU-Seasons dataset, where we outperform state-of-the-art learning-based descriptors in retrieval-based localization for high and medium precision scenarios.
Tasks Autonomous Driving, Visual Localization
Published 2019-09-23
URL https://arxiv.org/abs/1909.10184v1
PDF https://arxiv.org/pdf/1909.10184v1.pdf
PWC https://paperswithcode.com/paper/190910184
Repo https://github.com/HanjiangHu/DIFL-FCL
Framework pytorch

Libri-Light: A Benchmark for ASR with Limited or No Supervision

Title Libri-Light: A Benchmark for ASR with Limited or No Supervision
Authors Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux
Abstract We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art.
Tasks Action Detection, Activity Detection, Speech Recognition
Published 2019-12-17
URL https://arxiv.org/abs/1912.07875v1
PDF https://arxiv.org/pdf/1912.07875v1.pdf
PWC https://paperswithcode.com/paper/libri-light-a-benchmark-for-asr-with-limited
Repo https://github.com/facebookresearch/libri-light
Framework none

Towards Ethical Content-Based Detection of Online Influence Campaigns

Title Towards Ethical Content-Based Detection of Online Influence Campaigns
Authors Evan Crothers, Nathalie Japkowicz, Herna Viktor
Abstract The detection of clandestine efforts to influence users in online communities is a challenging problem with significant active development. We demonstrate that features derived from the text of user comments are useful for identifying suspect activity, but lead to increased erroneous identifications when keywords over-represented in past influence campaigns are present. Drawing on research in native language identification (NLI), we use “named entity masking” (NEM) to create sentence features robust to this shortcoming, while maintaining comparable classification accuracy. We demonstrate that while NEM consistently reduces false positives when key named entities are mentioned, both masked and unmasked models exhibit increased false positive rates on English sentences by Russian native speakers, raising ethical considerations that should be addressed in future research.
Tasks Language Identification, Native Language Identification
Published 2019-08-29
URL https://arxiv.org/abs/1908.11030v1
PDF https://arxiv.org/pdf/1908.11030v1.pdf
PWC https://paperswithcode.com/paper/towards-ethical-content-based-detection-of
Repo https://github.com/ecrows/l2-reddit-experiment
Framework tf

Deep Adversarial Social Recommendation

Title Deep Adversarial Social Recommendation
Authors Wenqi Fan, Tyler Derr, Yao Ma, Jianping Wang, Jiliang Tang, Qing Li
Abstract Recent years have witnessed rapid developments on social recommendation techniques for improving the performance of recommender systems due to the growing influence of social networks to our daily life. The majority of existing social recommendation methods unify user representation for the user-item interactions (item domain) and user-user connections (social domain). However, it may restrain user representation learning in each respective domain, since users behave and interact differently in the two domains, which makes their representations to be heterogeneous. In addition, most of traditional recommender systems can not efficiently optimize these objectives, since they utilize negative sampling technique which is unable to provide enough informative guidance towards the training during the optimization process. In this paper, to address the aforementioned challenges, we propose a novel deep adversarial social recommendation framework DASO. It adopts a bidirectional mapping method to transfer users’ information between social domain and item domain using adversarial learning. Comprehensive experiments on two real-world datasets show the effectiveness of the proposed framework.
Tasks Recommendation Systems, Representation Learning
Published 2019-05-30
URL https://arxiv.org/abs/1905.13160v1
PDF https://arxiv.org/pdf/1905.13160v1.pdf
PWC https://paperswithcode.com/paper/deep-adversarial-social-recommendation
Repo https://github.com/wenqifan03/GraphRec-WWW19
Framework pytorch

Incorporating Sememes into Chinese Definition Modeling

Title Incorporating Sememes into Chinese Definition Modeling
Authors Liner Yang, Cunliang Kong, Yun Chen, Yang Liu, Qinan Fan, Erhong Yang
Abstract Chinese definition modeling is a challenging task that generates a dictionary definition in Chinese for a given Chinese word. To accomplish this task, we construct the Chinese Definition Modeling Corpus (CDM), which contains triples of word, sememes and the corresponding definition. We present two novel models to improve Chinese definition modeling: the Adaptive-Attention model (AAM) and the Self- and Adaptive-Attention Model (SAAM). AAM successfully incorporates sememes for generating the definition with an adaptive attention mechanism. It has the capability to decide which sememes to focus on and when to pay attention to sememes. SAAM further replaces recurrent connections in AAM with self-attention and relies entirely on the attention mechanism, reducing the path length between word, sememes and definition. Experiments on CDM demonstrate that by incorporating sememes, our best proposed model can outperform the state-of-the-art method by +6.0 BLEU.
Tasks
Published 2019-05-16
URL https://arxiv.org/abs/1905.06512v1
PDF https://arxiv.org/pdf/1905.06512v1.pdf
PWC https://paperswithcode.com/paper/incorporating-sememes-into-chinese-definition
Repo https://github.com/blcu-nlp/chinese-definition
Framework pytorch

Gradient-based Adaptive Markov Chain Monte Carlo

Title Gradient-based Adaptive Markov Chain Monte Carlo
Authors Michalis K. Titsias, Petros Dellaportas
Abstract We introduce a gradient-based learning method to automatically adapt Markov chain Monte Carlo (MCMC) proposal distributions to intractable targets. We define a maximum entropy regularised objective function, referred to as generalised speed measure, which can be robustly optimised over the parameters of the proposal distribution by applying stochastic gradient optimisation. An advantage of our method compared to traditional adaptive MCMC methods is that the adaptation occurs even when candidate state values are rejected. This is a highly desirable property of any adaptation strategy because the adaptation starts in early iterations even if the initial proposal distribution is far from optimum. We apply the framework for learning multivariate random walk Metropolis and Metropolis-adjusted Langevin proposals with full covariance matrices, and provide empirical evidence that our method can outperform other MCMC algorithms, including Hamiltonian Monte Carlo schemes.
Tasks
Published 2019-11-04
URL https://arxiv.org/abs/1911.01373v2
PDF https://arxiv.org/pdf/1911.01373v2.pdf
PWC https://paperswithcode.com/paper/gradient-based-adaptive-markov-chain-monte
Repo https://github.com/mtitsias/gadMCMC
Framework none

UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs

Title UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs
Authors Jian Zhu, Zuoyu Tian, Sandra Kübler
Abstract This paper describes the UM-IU@LING’s system for the SemEval 2019 Task 6: OffensEval. We take a mixed approach to identify and categorize hate speech in social media. In subtask A, we fine-tuned a BERT based classifier to detect abusive content in tweets, achieving a macro F1 score of 0.8136 on the test data, thus reaching the 3rd rank out of 103 submissions. In subtasks B and C, we used a linear SVM with selected character n-gram features. For subtask C, our system could identify the target of abuse with a macro F1 score of 0.5243, ranking it 27th out of 65 submissions.
Tasks
Published 2019-04-06
URL http://arxiv.org/abs/1904.03450v1
PDF http://arxiv.org/pdf/1904.03450v1.pdf
PWC https://paperswithcode.com/paper/um-iuling-at-semeval-2019-task-6-identifying
Repo https://github.com/zytian9/SemEval-2019-Task-6
Framework pytorch

NELEC at SemEval-2019 Task 3: Think Twice Before Going Deep

Title NELEC at SemEval-2019 Task 3: Think Twice Before Going Deep
Authors Parag Agrawal, Anshuman Suri
Abstract Existing Machine Learning techniques yield close to human performance on text-based classification tasks. However, the presence of multi-modal noise in chat data such as emoticons, slang, spelling mistakes, code-mixed data, etc. makes existing deep-learning solutions perform poorly. The inability of deep-learning systems to robustly capture these covariates puts a cap on their performance. We propose NELEC: Neural and Lexical Combiner, a system which elegantly combines textual and deep-learning based methods for sentiment classification. We evaluate our system as part of the third task of ‘Contextual Emotion Detection in Text’ as part of SemEval-2019. Our system performs significantly better than the baseline, as well as our deep-learning model benchmarks. It achieved a micro-averaged F1 score of 0.7765, ranking 3rd on the test-set leader-board. Our code is available at https://github.com/iamgroot42/nelec
Tasks Sentiment Analysis
Published 2019-04-05
URL http://arxiv.org/abs/1904.03223v1
PDF http://arxiv.org/pdf/1904.03223v1.pdf
PWC https://paperswithcode.com/paper/nelec-at-semeval-2019-task-3-think-twice
Repo https://github.com/iamgroot42/nelec
Framework tf

CraftAssist: A Framework for Dialogue-enabled Interactive Agents

Title CraftAssist: A Framework for Dialogue-enabled Interactive Agents
Authors Jonathan Gray, Kavya Srinet, Yacine Jernite, Haonan Yu, Zhuoyuan Chen, Demi Guo, Siddharth Goyal, C. Lawrence Zitnick, Arthur Szlam
Abstract This paper describes an implementation of a bot assistant in Minecraft, and the tools and platform allowing players to interact with the bot and to record those interactions. The purpose of building such an assistant is to facilitate the study of agents that can complete tasks specified by dialogue, and eventually, to learn from dialogue interactions.
Tasks
Published 2019-07-19
URL https://arxiv.org/abs/1907.08584v1
PDF https://arxiv.org/pdf/1907.08584v1.pdf
PWC https://paperswithcode.com/paper/craftassist-a-framework-for-dialogue-enabled
Repo https://github.com/facebookresearch/craftassist
Framework pytorch

Interconnected Question Generation with Coreference Alignment and Conversation Flow Modeling

Title Interconnected Question Generation with Coreference Alignment and Conversation Flow Modeling
Authors Yifan Gao, Piji Li, Irwin King, Michael R. Lyu
Abstract We study the problem of generating interconnected questions in question-answering style conversations. Compared with previous works which generate questions based on a single sentence (or paragraph), this setting is different in two major aspects: (1) Questions are highly conversational. Almost half of them refer back to conversation history using coreferences. (2) In a coherent conversation, questions have smooth transitions between turns. We propose an end-to-end neural model with coreference alignment and conversation flow modeling. The coreference alignment modeling explicitly aligns coreferent mentions in conversation history with corresponding pronominal references in generated questions, which makes generated questions interconnected to conversation history. The conversation flow modeling builds a coherent conversation by starting questioning on the first few sentences in a text passage and smoothly shifting the focus to later parts. Extensive experiments show that our system outperforms several baselines and can generate highly conversational questions. The code implementation is released at https://github.com/Evan-Gao/conversational-QG.
Tasks Question Answering, Question Generation
Published 2019-06-17
URL https://arxiv.org/abs/1906.06893v1
PDF https://arxiv.org/pdf/1906.06893v1.pdf
PWC https://paperswithcode.com/paper/interconnected-question-generation-with
Repo https://github.com/Evan-Gao/conversational-QG
Framework pytorch
comments powered by Disqus