Paper Group AWR 452
Stacked Capsule Autoencoders. ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents. Segmenting the Future. Beyond Personalization: Social Content Recommendation for Creator Equality and Consumer Satisfaction. GradNet: Gradient-Guided Network for Visual Object Tracking. Retrieval-based Localization Based on Domain-invariant …
Stacked Capsule Autoencoders
Title | Stacked Capsule Autoencoders |
Authors | Adam R. Kosiorek, Sara Sabour, Yee Whye Teh, Geoffrey E. Hinton |
Abstract | Objects are composed of a set of geometrically organized parts. We introduce an unsupervised capsule autoencoder (SCAE), which explicitly uses geometric relationships between parts to reason about objects. Since these relationships do not depend on the viewpoint, our model is robust to viewpoint changes. SCAE consists of two stages. In the first stage, the model predicts presences and poses of part templates directly from the image and tries to reconstruct the image by appropriately arranging the templates. In the second stage, SCAE predicts parameters of a few object capsules, which are then used to reconstruct part poses. Inference in this model is amortized and performed by off-the-shelf neural encoders, unlike in previous capsule networks. We find that object capsule presences are highly informative of the object class, which leads to state-of-the-art results for unsupervised classification on SVHN (55%) and MNIST (98.7%). The code is available at https://github.com/google-research/google-research/tree/master/stacked_capsule_autoencoders |
Tasks | |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.06818v2 |
https://arxiv.org/pdf/1906.06818v2.pdf | |
PWC | https://paperswithcode.com/paper/stacked-capsule-autoencoders |
Repo | https://github.com/akosiorek/stacked_capsule_autoencoders |
Framework | tf |
ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents
Title | ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents |
Authors | Vincent Christlein, Anguelos Nicolaou, Mathias Seuret, Dominique Stutzmann, Andreas Maier |
Abstract | This competition investigates the performance of large-scale retrieval of historical document images based on writing style. Based on large image data sets provided by cultural heritage institutions and digital libraries, providing a total of 20 000 document images representing about 10 000 writers, divided in three types: writers of (i) manuscript books, (ii) letters, (iii) charters and legal documents. We focus on the task of automatic image retrieval to simulate common scenarios of humanities research, such as writer retrieval. The most teams submitted traditional methods not using deep learning techniques. The competition results show that a combination of methods is outperforming single methods. Furthermore, letters are much more difficult to retrieve than manuscripts. |
Tasks | Image Retrieval |
Published | 2019-12-08 |
URL | https://arxiv.org/abs/1912.03713v1 |
https://arxiv.org/pdf/1912.03713v1.pdf | |
PWC | https://paperswithcode.com/paper/icdar-2019-competition-on-image-retrieval-for |
Repo | https://github.com/masyagin1998/robin |
Framework | tf |
Segmenting the Future
Title | Segmenting the Future |
Authors | Hsu-kuang Chiu, Ehsan Adeli, Juan Carlos Niebles |
Abstract | Predicting the future is an important aspect for decision-making in robotics or autonomous driving systems, which heavily rely upon visual scene understanding. While prior work attempts to predict future video pixels, anticipate activities or forecast future scene semantic segments from segmentation of the preceding frames, methods that predict future semantic segmentation solely from the previous frame RGB data in a single end-to-end trainable model do not exist. In this paper, we propose a temporal encoder-decoder network architecture that encodes RGB frames from the past and decodes the future semantic segmentation. The network is coupled with a new knowledge distillation training framework specific for the forecasting task. Our method, only seeing preceding video frames, implicitly models the scene segments while simultaneously accounting for the object dynamics to infer the future scene semantic segments. Our results on Cityscapes and Apolloscape outperform the baseline and current state-of-the-art methods. Code is available at https://github.com/eddyhkchiu/segmenting_the_future/. |
Tasks | Autonomous Driving, Decision Making, Scene Understanding, Semantic Segmentation |
Published | 2019-04-24 |
URL | https://arxiv.org/abs/1904.10666v2 |
https://arxiv.org/pdf/1904.10666v2.pdf | |
PWC | https://paperswithcode.com/paper/segmenting-the-future |
Repo | https://github.com/eddyhkchiu/segmenting_the_future |
Framework | none |
Beyond Personalization: Social Content Recommendation for Creator Equality and Consumer Satisfaction
Title | Beyond Personalization: Social Content Recommendation for Creator Equality and Consumer Satisfaction |
Authors | Wenyi Xiao, Huan Zhao, Haojie Pan, Yangqiu Song, Vincent W. Zheng, Qiang Yang |
Abstract | An effective content recommendation in modern social media platforms should benefit both creators to bring genuine benefits to them and consumers to help them get really interesting content. In this paper, we propose a model called Social Explorative Attention Network (SEAN) for content recommendation. SEAN uses a personalized content recommendation model to encourage personal interests driven recommendation. Moreover, SEAN allows the personalization factors to attend to users’ higher-order friends on the social network to improve the accuracy and diversity of recommendation results. Constructing two datasets from a popular decentralized content distribution platform, Steemit, we compare SEAN with state-of-the-art CF and content based recommendation approaches. Experimental results demonstrate the effectiveness of SEAN in terms of both Gini coefficients for recommendation equality and F1 scores for recommendation performance. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11900v3 |
https://arxiv.org/pdf/1905.11900v3.pdf | |
PWC | https://paperswithcode.com/paper/beyond-personalization-social-content |
Repo | https://github.com/HKUST-KnowComp/Social-Explorative-Attention-Networks |
Framework | none |
GradNet: Gradient-Guided Network for Visual Object Tracking
Title | GradNet: Gradient-Guided Network for Visual Object Tracking |
Authors | Peixia Li, Boyu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, Huchuan Lu |
Abstract | The fully-convolutional siamese network based on template matching has shown great potentials in visual tracking. During testing, the template is fixed with the initial target feature and the performance totally relies on the general matching ability of the siamese network. However, this manner cannot capture the temporal variations of targets or background clutter. In this work, we propose a novel gradient-guided network to exploit the discriminative information in gradients and update the template in the siamese network through feed-forward and backward operations. Our algorithm performs feed-forward and backward operations to exploit the discriminative informaiton in gradients and capture the core attention of the target. To be specific, the algorithm can utilize the information from the gradient to update the template in the current frame. In addition, a template generalization training method is proposed to better use gradient information and avoid overfitting. To our knowledge, this work is the first attempt to exploit the information in the gradient for template update in siamese-based trackers. Extensive experiments on recent benchmarks demonstrate that our method achieves better performance than other state-of-the-art trackers. |
Tasks | Object Tracking, Visual Object Tracking, Visual Tracking |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06800v1 |
https://arxiv.org/pdf/1909.06800v1.pdf | |
PWC | https://paperswithcode.com/paper/gradnet-gradient-guided-network-for-visual |
Repo | https://github.com/LPXTT/GradNet-Pytorch |
Framework | tf |
Retrieval-based Localization Based on Domain-invariant Feature Learning under Changing Environments
Title | Retrieval-based Localization Based on Domain-invariant Feature Learning under Changing Environments |
Authors | Hanjiang Hu, Hesheng Wang, Zhe Liu, Chenguang Yang, Weidong Chen, Le Xie |
Abstract | Visual localization is a crucial problem in mobile robotics and autonomous driving. One solution is to retrieve images with known pose from a database for the localization of query images. However, in environments with drastically varying conditions (e.g. illumination changes, seasons, occlusion, dynamic objects), retrieval-based localization is severely hampered and becomes a challenging problem. In this paper, a novel domain-invariant feature learning method (DIFL) is proposed based on ComboGAN, a multi-domain image translation network architecture. By introducing a feature consistency loss (FCL) between the encoded features of the original image and translated image in another domain, we are able to train the encoders to generate domain-invariant features in a self-supervised manner. To retrieve a target image from the database, the query image is first encoded using the encoder belonging to the query domain to obtain a domain-invariant feature vector. We then preform retrieval by selecting the database image with the most similar domain-invariant feature vector. We validate the proposed approach on the CMU-Seasons dataset, where we outperform state-of-the-art learning-based descriptors in retrieval-based localization for high and medium precision scenarios. |
Tasks | Autonomous Driving, Visual Localization |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10184v1 |
https://arxiv.org/pdf/1909.10184v1.pdf | |
PWC | https://paperswithcode.com/paper/190910184 |
Repo | https://github.com/HanjiangHu/DIFL-FCL |
Framework | pytorch |
Libri-Light: A Benchmark for ASR with Limited or No Supervision
Title | Libri-Light: A Benchmark for ASR with Limited or No Supervision |
Authors | Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux |
Abstract | We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art. |
Tasks | Action Detection, Activity Detection, Speech Recognition |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.07875v1 |
https://arxiv.org/pdf/1912.07875v1.pdf | |
PWC | https://paperswithcode.com/paper/libri-light-a-benchmark-for-asr-with-limited |
Repo | https://github.com/facebookresearch/libri-light |
Framework | none |
Towards Ethical Content-Based Detection of Online Influence Campaigns
Title | Towards Ethical Content-Based Detection of Online Influence Campaigns |
Authors | Evan Crothers, Nathalie Japkowicz, Herna Viktor |
Abstract | The detection of clandestine efforts to influence users in online communities is a challenging problem with significant active development. We demonstrate that features derived from the text of user comments are useful for identifying suspect activity, but lead to increased erroneous identifications when keywords over-represented in past influence campaigns are present. Drawing on research in native language identification (NLI), we use “named entity masking” (NEM) to create sentence features robust to this shortcoming, while maintaining comparable classification accuracy. We demonstrate that while NEM consistently reduces false positives when key named entities are mentioned, both masked and unmasked models exhibit increased false positive rates on English sentences by Russian native speakers, raising ethical considerations that should be addressed in future research. |
Tasks | Language Identification, Native Language Identification |
Published | 2019-08-29 |
URL | https://arxiv.org/abs/1908.11030v1 |
https://arxiv.org/pdf/1908.11030v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-ethical-content-based-detection-of |
Repo | https://github.com/ecrows/l2-reddit-experiment |
Framework | tf |
Deep Adversarial Social Recommendation
Title | Deep Adversarial Social Recommendation |
Authors | Wenqi Fan, Tyler Derr, Yao Ma, Jianping Wang, Jiliang Tang, Qing Li |
Abstract | Recent years have witnessed rapid developments on social recommendation techniques for improving the performance of recommender systems due to the growing influence of social networks to our daily life. The majority of existing social recommendation methods unify user representation for the user-item interactions (item domain) and user-user connections (social domain). However, it may restrain user representation learning in each respective domain, since users behave and interact differently in the two domains, which makes their representations to be heterogeneous. In addition, most of traditional recommender systems can not efficiently optimize these objectives, since they utilize negative sampling technique which is unable to provide enough informative guidance towards the training during the optimization process. In this paper, to address the aforementioned challenges, we propose a novel deep adversarial social recommendation framework DASO. It adopts a bidirectional mapping method to transfer users’ information between social domain and item domain using adversarial learning. Comprehensive experiments on two real-world datasets show the effectiveness of the proposed framework. |
Tasks | Recommendation Systems, Representation Learning |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13160v1 |
https://arxiv.org/pdf/1905.13160v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-adversarial-social-recommendation |
Repo | https://github.com/wenqifan03/GraphRec-WWW19 |
Framework | pytorch |
Incorporating Sememes into Chinese Definition Modeling
Title | Incorporating Sememes into Chinese Definition Modeling |
Authors | Liner Yang, Cunliang Kong, Yun Chen, Yang Liu, Qinan Fan, Erhong Yang |
Abstract | Chinese definition modeling is a challenging task that generates a dictionary definition in Chinese for a given Chinese word. To accomplish this task, we construct the Chinese Definition Modeling Corpus (CDM), which contains triples of word, sememes and the corresponding definition. We present two novel models to improve Chinese definition modeling: the Adaptive-Attention model (AAM) and the Self- and Adaptive-Attention Model (SAAM). AAM successfully incorporates sememes for generating the definition with an adaptive attention mechanism. It has the capability to decide which sememes to focus on and when to pay attention to sememes. SAAM further replaces recurrent connections in AAM with self-attention and relies entirely on the attention mechanism, reducing the path length between word, sememes and definition. Experiments on CDM demonstrate that by incorporating sememes, our best proposed model can outperform the state-of-the-art method by +6.0 BLEU. |
Tasks | |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06512v1 |
https://arxiv.org/pdf/1905.06512v1.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-sememes-into-chinese-definition |
Repo | https://github.com/blcu-nlp/chinese-definition |
Framework | pytorch |
Gradient-based Adaptive Markov Chain Monte Carlo
Title | Gradient-based Adaptive Markov Chain Monte Carlo |
Authors | Michalis K. Titsias, Petros Dellaportas |
Abstract | We introduce a gradient-based learning method to automatically adapt Markov chain Monte Carlo (MCMC) proposal distributions to intractable targets. We define a maximum entropy regularised objective function, referred to as generalised speed measure, which can be robustly optimised over the parameters of the proposal distribution by applying stochastic gradient optimisation. An advantage of our method compared to traditional adaptive MCMC methods is that the adaptation occurs even when candidate state values are rejected. This is a highly desirable property of any adaptation strategy because the adaptation starts in early iterations even if the initial proposal distribution is far from optimum. We apply the framework for learning multivariate random walk Metropolis and Metropolis-adjusted Langevin proposals with full covariance matrices, and provide empirical evidence that our method can outperform other MCMC algorithms, including Hamiltonian Monte Carlo schemes. |
Tasks | |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01373v2 |
https://arxiv.org/pdf/1911.01373v2.pdf | |
PWC | https://paperswithcode.com/paper/gradient-based-adaptive-markov-chain-monte |
Repo | https://github.com/mtitsias/gadMCMC |
Framework | none |
UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs
Title | UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs |
Authors | Jian Zhu, Zuoyu Tian, Sandra Kübler |
Abstract | This paper describes the UM-IU@LING’s system for the SemEval 2019 Task 6: OffensEval. We take a mixed approach to identify and categorize hate speech in social media. In subtask A, we fine-tuned a BERT based classifier to detect abusive content in tweets, achieving a macro F1 score of 0.8136 on the test data, thus reaching the 3rd rank out of 103 submissions. In subtasks B and C, we used a linear SVM with selected character n-gram features. For subtask C, our system could identify the target of abuse with a macro F1 score of 0.5243, ranking it 27th out of 65 submissions. |
Tasks | |
Published | 2019-04-06 |
URL | http://arxiv.org/abs/1904.03450v1 |
http://arxiv.org/pdf/1904.03450v1.pdf | |
PWC | https://paperswithcode.com/paper/um-iuling-at-semeval-2019-task-6-identifying |
Repo | https://github.com/zytian9/SemEval-2019-Task-6 |
Framework | pytorch |
NELEC at SemEval-2019 Task 3: Think Twice Before Going Deep
Title | NELEC at SemEval-2019 Task 3: Think Twice Before Going Deep |
Authors | Parag Agrawal, Anshuman Suri |
Abstract | Existing Machine Learning techniques yield close to human performance on text-based classification tasks. However, the presence of multi-modal noise in chat data such as emoticons, slang, spelling mistakes, code-mixed data, etc. makes existing deep-learning solutions perform poorly. The inability of deep-learning systems to robustly capture these covariates puts a cap on their performance. We propose NELEC: Neural and Lexical Combiner, a system which elegantly combines textual and deep-learning based methods for sentiment classification. We evaluate our system as part of the third task of ‘Contextual Emotion Detection in Text’ as part of SemEval-2019. Our system performs significantly better than the baseline, as well as our deep-learning model benchmarks. It achieved a micro-averaged F1 score of 0.7765, ranking 3rd on the test-set leader-board. Our code is available at https://github.com/iamgroot42/nelec |
Tasks | Sentiment Analysis |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.03223v1 |
http://arxiv.org/pdf/1904.03223v1.pdf | |
PWC | https://paperswithcode.com/paper/nelec-at-semeval-2019-task-3-think-twice |
Repo | https://github.com/iamgroot42/nelec |
Framework | tf |
CraftAssist: A Framework for Dialogue-enabled Interactive Agents
Title | CraftAssist: A Framework for Dialogue-enabled Interactive Agents |
Authors | Jonathan Gray, Kavya Srinet, Yacine Jernite, Haonan Yu, Zhuoyuan Chen, Demi Guo, Siddharth Goyal, C. Lawrence Zitnick, Arthur Szlam |
Abstract | This paper describes an implementation of a bot assistant in Minecraft, and the tools and platform allowing players to interact with the bot and to record those interactions. The purpose of building such an assistant is to facilitate the study of agents that can complete tasks specified by dialogue, and eventually, to learn from dialogue interactions. |
Tasks | |
Published | 2019-07-19 |
URL | https://arxiv.org/abs/1907.08584v1 |
https://arxiv.org/pdf/1907.08584v1.pdf | |
PWC | https://paperswithcode.com/paper/craftassist-a-framework-for-dialogue-enabled |
Repo | https://github.com/facebookresearch/craftassist |
Framework | pytorch |
Interconnected Question Generation with Coreference Alignment and Conversation Flow Modeling
Title | Interconnected Question Generation with Coreference Alignment and Conversation Flow Modeling |
Authors | Yifan Gao, Piji Li, Irwin King, Michael R. Lyu |
Abstract | We study the problem of generating interconnected questions in question-answering style conversations. Compared with previous works which generate questions based on a single sentence (or paragraph), this setting is different in two major aspects: (1) Questions are highly conversational. Almost half of them refer back to conversation history using coreferences. (2) In a coherent conversation, questions have smooth transitions between turns. We propose an end-to-end neural model with coreference alignment and conversation flow modeling. The coreference alignment modeling explicitly aligns coreferent mentions in conversation history with corresponding pronominal references in generated questions, which makes generated questions interconnected to conversation history. The conversation flow modeling builds a coherent conversation by starting questioning on the first few sentences in a text passage and smoothly shifting the focus to later parts. Extensive experiments show that our system outperforms several baselines and can generate highly conversational questions. The code implementation is released at https://github.com/Evan-Gao/conversational-QG. |
Tasks | Question Answering, Question Generation |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.06893v1 |
https://arxiv.org/pdf/1906.06893v1.pdf | |
PWC | https://paperswithcode.com/paper/interconnected-question-generation-with |
Repo | https://github.com/Evan-Gao/conversational-QG |
Framework | pytorch |