January 31, 2020

2750 words 13 mins read

Paper Group AWR 452

Stacked Capsule Autoencoders. ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents. Segmenting the Future. Beyond Personalization: Social Content Recommendation for Creator Equality and Consumer Satisfaction. GradNet: Gradient-Guided Network for Visual Object Tracking. Retrieval-based Localization Based on Domain-invariant …

Stacked Capsule Autoencoders


Title	Stacked Capsule Autoencoders
Authors	Adam R. Kosiorek, Sara Sabour, Yee Whye Teh, Geoffrey E. Hinton
Abstract	Objects are composed of a set of geometrically organized parts. We introduce an unsupervised capsule autoencoder (SCAE), which explicitly uses geometric relationships between parts to reason about objects. Since these relationships do not depend on the viewpoint, our model is robust to viewpoint changes. SCAE consists of two stages. In the first stage, the model predicts presences and poses of part templates directly from the image and tries to reconstruct the image by appropriately arranging the templates. In the second stage, SCAE predicts parameters of a few object capsules, which are then used to reconstruct part poses. Inference in this model is amortized and performed by off-the-shelf neural encoders, unlike in previous capsule networks. We find that object capsule presences are highly informative of the object class, which leads to state-of-the-art results for unsupervised classification on SVHN (55%) and MNIST (98.7%). The code is available at https://github.com/google-research/google-research/tree/master/stacked_capsule_autoencoders
Tasks
Published	2019-06-17
URL	https://arxiv.org/abs/1906.06818v2
PDF	https://arxiv.org/pdf/1906.06818v2.pdf
PWC	https://paperswithcode.com/paper/stacked-capsule-autoencoders
Repo	https://github.com/akosiorek/stacked_capsule_autoencoders
Framework	tf

ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents


Title	ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents
Authors	Vincent Christlein, Anguelos Nicolaou, Mathias Seuret, Dominique Stutzmann, Andreas Maier
Abstract	This competition investigates the performance of large-scale retrieval of historical document images based on writing style. Based on large image data sets provided by cultural heritage institutions and digital libraries, providing a total of 20 000 document images representing about 10 000 writers, divided in three types: writers of (i) manuscript books, (ii) letters, (iii) charters and legal documents. We focus on the task of automatic image retrieval to simulate common scenarios of humanities research, such as writer retrieval. The most teams submitted traditional methods not using deep learning techniques. The competition results show that a combination of methods is outperforming single methods. Furthermore, letters are much more difficult to retrieve than manuscripts.
Tasks	Image Retrieval
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03713v1
PDF	https://arxiv.org/pdf/1912.03713v1.pdf
PWC	https://paperswithcode.com/paper/icdar-2019-competition-on-image-retrieval-for
Repo	https://github.com/masyagin1998/robin
Framework	tf

Segmenting the Future


Title	Segmenting the Future
Authors	Hsu-kuang Chiu, Ehsan Adeli, Juan Carlos Niebles
Abstract	Predicting the future is an important aspect for decision-making in robotics or autonomous driving systems, which heavily rely upon visual scene understanding. While prior work attempts to predict future video pixels, anticipate activities or forecast future scene semantic segments from segmentation of the preceding frames, methods that predict future semantic segmentation solely from the previous frame RGB data in a single end-to-end trainable model do not exist. In this paper, we propose a temporal encoder-decoder network architecture that encodes RGB frames from the past and decodes the future semantic segmentation. The network is coupled with a new knowledge distillation training framework specific for the forecasting task. Our method, only seeing preceding video frames, implicitly models the scene segments while simultaneously accounting for the object dynamics to infer the future scene semantic segments. Our results on Cityscapes and Apolloscape outperform the baseline and current state-of-the-art methods. Code is available at https://github.com/eddyhkchiu/segmenting_the_future/.
Tasks	Autonomous Driving, Decision Making, Scene Understanding, Semantic Segmentation
Published	2019-04-24
URL	https://arxiv.org/abs/1904.10666v2
PDF	https://arxiv.org/pdf/1904.10666v2.pdf
PWC	https://paperswithcode.com/paper/segmenting-the-future
Repo	https://github.com/eddyhkchiu/segmenting_the_future
Framework	none


Title	Beyond Personalization: Social Content Recommendation for Creator Equality and Consumer Satisfaction
Authors	Wenyi Xiao, Huan Zhao, Haojie Pan, Yangqiu Song, Vincent W. Zheng, Qiang Yang
Abstract	An effective content recommendation in modern social media platforms should benefit both creators to bring genuine benefits to them and consumers to help them get really interesting content. In this paper, we propose a model called Social Explorative Attention Network (SEAN) for content recommendation. SEAN uses a personalized content recommendation model to encourage personal interests driven recommendation. Moreover, SEAN allows the personalization factors to attend to users’ higher-order friends on the social network to improve the accuracy and diversity of recommendation results. Constructing two datasets from a popular decentralized content distribution platform, Steemit, we compare SEAN with state-of-the-art CF and content based recommendation approaches. Experimental results demonstrate the effectiveness of SEAN in terms of both Gini coefficients for recommendation equality and F1 scores for recommendation performance.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11900v3
PDF	https://arxiv.org/pdf/1905.11900v3.pdf
PWC	https://paperswithcode.com/paper/beyond-personalization-social-content
Repo	https://github.com/HKUST-KnowComp/Social-Explorative-Attention-Networks
Framework	none

GradNet: Gradient-Guided Network for Visual Object Tracking


Title	GradNet: Gradient-Guided Network for Visual Object Tracking
Authors	Peixia Li, Boyu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, Huchuan Lu
Abstract	The fully-convolutional siamese network based on template matching has shown great potentials in visual tracking. During testing, the template is fixed with the initial target feature and the performance totally relies on the general matching ability of the siamese network. However, this manner cannot capture the temporal variations of targets or background clutter. In this work, we propose a novel gradient-guided network to exploit the discriminative information in gradients and update the template in the siamese network through feed-forward and backward operations. Our algorithm performs feed-forward and backward operations to exploit the discriminative informaiton in gradients and capture the core attention of the target. To be specific, the algorithm can utilize the information from the gradient to update the template in the current frame. In addition, a template generalization training method is proposed to better use gradient information and avoid overfitting. To our knowledge, this work is the first attempt to exploit the information in the gradient for template update in siamese-based trackers. Extensive experiments on recent benchmarks demonstrate that our method achieves better performance than other state-of-the-art trackers.
Tasks	Object Tracking, Visual Object Tracking, Visual Tracking
Published	2019-09-15
URL	https://arxiv.org/abs/1909.06800v1
PDF	https://arxiv.org/pdf/1909.06800v1.pdf
PWC	https://paperswithcode.com/paper/gradnet-gradient-guided-network-for-visual
Repo	https://github.com/LPXTT/GradNet-Pytorch
Framework	tf

Retrieval-based Localization Based on Domain-invariant Feature Learning under Changing Environments


Title	Retrieval-based Localization Based on Domain-invariant Feature Learning under Changing Environments
Authors	Hanjiang Hu, Hesheng Wang, Zhe Liu, Chenguang Yang, Weidong Chen, Le Xie
Abstract	Visual localization is a crucial problem in mobile robotics and autonomous driving. One solution is to retrieve images with known pose from a database for the localization of query images. However, in environments with drastically varying conditions (e.g. illumination changes, seasons, occlusion, dynamic objects), retrieval-based localization is severely hampered and becomes a challenging problem. In this paper, a novel domain-invariant feature learning method (DIFL) is proposed based on ComboGAN, a multi-domain image translation network architecture. By introducing a feature consistency loss (FCL) between the encoded features of the original image and translated image in another domain, we are able to train the encoders to generate domain-invariant features in a self-supervised manner. To retrieve a target image from the database, the query image is first encoded using the encoder belonging to the query domain to obtain a domain-invariant feature vector. We then preform retrieval by selecting the database image with the most similar domain-invariant feature vector. We validate the proposed approach on the CMU-Seasons dataset, where we outperform state-of-the-art learning-based descriptors in retrieval-based localization for high and medium precision scenarios.
Tasks	Autonomous Driving, Visual Localization
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10184v1
PDF	https://arxiv.org/pdf/1909.10184v1.pdf
PWC	https://paperswithcode.com/paper/190910184
Repo	https://github.com/HanjiangHu/DIFL-FCL
Framework	pytorch

Libri-Light: A Benchmark for ASR with Limited or No Supervision


Title	Libri-Light: A Benchmark for ASR with Limited or No Supervision
Authors	Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux
Abstract	We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art.
Tasks	Action Detection, Activity Detection, Speech Recognition
Published	2019-12-17
URL	https://arxiv.org/abs/1912.07875v1
PDF	https://arxiv.org/pdf/1912.07875v1.pdf
PWC	https://paperswithcode.com/paper/libri-light-a-benchmark-for-asr-with-limited
Repo	https://github.com/facebookresearch/libri-light
Framework	none

Towards Ethical Content-Based Detection of Online Influence Campaigns


Title	Towards Ethical Content-Based Detection of Online Influence Campaigns
Authors	Evan Crothers, Nathalie Japkowicz, Herna Viktor
Abstract	The detection of clandestine efforts to influence users in online communities is a challenging problem with significant active development. We demonstrate that features derived from the text of user comments are useful for identifying suspect activity, but lead to increased erroneous identifications when keywords over-represented in past influence campaigns are present. Drawing on research in native language identification (NLI), we use “named entity masking” (NEM) to create sentence features robust to this shortcoming, while maintaining comparable classification accuracy. We demonstrate that while NEM consistently reduces false positives when key named entities are mentioned, both masked and unmasked models exhibit increased false positive rates on English sentences by Russian native speakers, raising ethical considerations that should be addressed in future research.
Tasks	Language Identification, Native Language Identification
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11030v1
PDF	https://arxiv.org/pdf/1908.11030v1.pdf
PWC	https://paperswithcode.com/paper/towards-ethical-content-based-detection-of
Repo	https://github.com/ecrows/l2-reddit-experiment
Framework	tf


Title	Deep Adversarial Social Recommendation
Authors	Wenqi Fan, Tyler Derr, Yao Ma, Jianping Wang, Jiliang Tang, Qing Li
Abstract	Recent years have witnessed rapid developments on social recommendation techniques for improving the performance of recommender systems due to the growing influence of social networks to our daily life. The majority of existing social recommendation methods unify user representation for the user-item interactions (item domain) and user-user connections (social domain). However, it may restrain user representation learning in each respective domain, since users behave and interact differently in the two domains, which makes their representations to be heterogeneous. In addition, most of traditional recommender systems can not efficiently optimize these objectives, since they utilize negative sampling technique which is unable to provide enough informative guidance towards the training during the optimization process. In this paper, to address the aforementioned challenges, we propose a novel deep adversarial social recommendation framework DASO. It adopts a bidirectional mapping method to transfer users’ information between social domain and item domain using adversarial learning. Comprehensive experiments on two real-world datasets show the effectiveness of the proposed framework.
Tasks	Recommendation Systems, Representation Learning
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13160v1
PDF	https://arxiv.org/pdf/1905.13160v1.pdf
PWC	https://paperswithcode.com/paper/deep-adversarial-social-recommendation
Repo	https://github.com/wenqifan03/GraphRec-WWW19
Framework	pytorch

Incorporating Sememes into Chinese Definition Modeling


Title	Incorporating Sememes into Chinese Definition Modeling
Authors	Liner Yang, Cunliang Kong, Yun Chen, Yang Liu, Qinan Fan, Erhong Yang
Abstract	Chinese definition modeling is a challenging task that generates a dictionary definition in Chinese for a given Chinese word. To accomplish this task, we construct the Chinese Definition Modeling Corpus (CDM), which contains triples of word, sememes and the corresponding definition. We present two novel models to improve Chinese definition modeling: the Adaptive-Attention model (AAM) and the Self- and Adaptive-Attention Model (SAAM). AAM successfully incorporates sememes for generating the definition with an adaptive attention mechanism. It has the capability to decide which sememes to focus on and when to pay attention to sememes. SAAM further replaces recurrent connections in AAM with self-attention and relies entirely on the attention mechanism, reducing the path length between word, sememes and definition. Experiments on CDM demonstrate that by incorporating sememes, our best proposed model can outperform the state-of-the-art method by +6.0 BLEU.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06512v1
PDF	https://arxiv.org/pdf/1905.06512v1.pdf
PWC	https://paperswithcode.com/paper/incorporating-sememes-into-chinese-definition
Repo	https://github.com/blcu-nlp/chinese-definition
Framework	pytorch

Gradient-based Adaptive Markov Chain Monte Carlo


Title	Gradient-based Adaptive Markov Chain Monte Carlo
Authors	Michalis K. Titsias, Petros Dellaportas
Abstract	We introduce a gradient-based learning method to automatically adapt Markov chain Monte Carlo (MCMC) proposal distributions to intractable targets. We define a maximum entropy regularised objective function, referred to as generalised speed measure, which can be robustly optimised over the parameters of the proposal distribution by applying stochastic gradient optimisation. An advantage of our method compared to traditional adaptive MCMC methods is that the adaptation occurs even when candidate state values are rejected. This is a highly desirable property of any adaptation strategy because the adaptation starts in early iterations even if the initial proposal distribution is far from optimum. We apply the framework for learning multivariate random walk Metropolis and Metropolis-adjusted Langevin proposals with full covariance matrices, and provide empirical evidence that our method can outperform other MCMC algorithms, including Hamiltonian Monte Carlo schemes.
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01373v2
PDF	https://arxiv.org/pdf/1911.01373v2.pdf
PWC	https://paperswithcode.com/paper/gradient-based-adaptive-markov-chain-monte
Repo	https://github.com/mtitsias/gadMCMC
Framework	none

UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs


Title	UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs
Authors	Jian Zhu, Zuoyu Tian, Sandra Kübler
Abstract	This paper describes the UM-IU@LING’s system for the SemEval 2019 Task 6: OffensEval. We take a mixed approach to identify and categorize hate speech in social media. In subtask A, we fine-tuned a BERT based classifier to detect abusive content in tweets, achieving a macro F1 score of 0.8136 on the test data, thus reaching the 3rd rank out of 103 submissions. In subtasks B and C, we used a linear SVM with selected character n-gram features. For subtask C, our system could identify the target of abuse with a macro F1 score of 0.5243, ranking it 27th out of 65 submissions.
Tasks
Published	2019-04-06
URL	http://arxiv.org/abs/1904.03450v1
PDF	http://arxiv.org/pdf/1904.03450v1.pdf
PWC	https://paperswithcode.com/paper/um-iuling-at-semeval-2019-task-6-identifying
Repo	https://github.com/zytian9/SemEval-2019-Task-6
Framework	pytorch

NELEC at SemEval-2019 Task 3: Think Twice Before Going Deep


Title	NELEC at SemEval-2019 Task 3: Think Twice Before Going Deep
Authors	Parag Agrawal, Anshuman Suri
Abstract	Existing Machine Learning techniques yield close to human performance on text-based classification tasks. However, the presence of multi-modal noise in chat data such as emoticons, slang, spelling mistakes, code-mixed data, etc. makes existing deep-learning solutions perform poorly. The inability of deep-learning systems to robustly capture these covariates puts a cap on their performance. We propose NELEC: Neural and Lexical Combiner, a system which elegantly combines textual and deep-learning based methods for sentiment classification. We evaluate our system as part of the third task of ‘Contextual Emotion Detection in Text’ as part of SemEval-2019. Our system performs significantly better than the baseline, as well as our deep-learning model benchmarks. It achieved a micro-averaged F1 score of 0.7765, ranking 3rd on the test-set leader-board. Our code is available at https://github.com/iamgroot42/nelec
Tasks	Sentiment Analysis
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03223v1
PDF	http://arxiv.org/pdf/1904.03223v1.pdf
PWC	https://paperswithcode.com/paper/nelec-at-semeval-2019-task-3-think-twice
Repo	https://github.com/iamgroot42/nelec
Framework	tf

CraftAssist: A Framework for Dialogue-enabled Interactive Agents


Title	CraftAssist: A Framework for Dialogue-enabled Interactive Agents
Authors	Jonathan Gray, Kavya Srinet, Yacine Jernite, Haonan Yu, Zhuoyuan Chen, Demi Guo, Siddharth Goyal, C. Lawrence Zitnick, Arthur Szlam
Abstract	This paper describes an implementation of a bot assistant in Minecraft, and the tools and platform allowing players to interact with the bot and to record those interactions. The purpose of building such an assistant is to facilitate the study of agents that can complete tasks specified by dialogue, and eventually, to learn from dialogue interactions.
Tasks
Published	2019-07-19
URL	https://arxiv.org/abs/1907.08584v1
PDF	https://arxiv.org/pdf/1907.08584v1.pdf
PWC	https://paperswithcode.com/paper/craftassist-a-framework-for-dialogue-enabled
Repo	https://github.com/facebookresearch/craftassist
Framework	pytorch

Interconnected Question Generation with Coreference Alignment and Conversation Flow Modeling


Title	Interconnected Question Generation with Coreference Alignment and Conversation Flow Modeling
Authors	Yifan Gao, Piji Li, Irwin King, Michael R. Lyu
Abstract	We study the problem of generating interconnected questions in question-answering style conversations. Compared with previous works which generate questions based on a single sentence (or paragraph), this setting is different in two major aspects: (1) Questions are highly conversational. Almost half of them refer back to conversation history using coreferences. (2) In a coherent conversation, questions have smooth transitions between turns. We propose an end-to-end neural model with coreference alignment and conversation flow modeling. The coreference alignment modeling explicitly aligns coreferent mentions in conversation history with corresponding pronominal references in generated questions, which makes generated questions interconnected to conversation history. The conversation flow modeling builds a coherent conversation by starting questioning on the first few sentences in a text passage and smoothly shifting the focus to later parts. Extensive experiments show that our system outperforms several baselines and can generate highly conversational questions. The code implementation is released at https://github.com/Evan-Gao/conversational-QG.
Tasks	Question Answering, Question Generation
Published	2019-06-17
URL	https://arxiv.org/abs/1906.06893v1
PDF	https://arxiv.org/pdf/1906.06893v1.pdf
PWC	https://paperswithcode.com/paper/interconnected-question-generation-with
Repo	https://github.com/Evan-Gao/conversational-QG
Framework	pytorch