January 24, 2020

2627 words 13 mins read

Paper Group NANR 259

Combining adaptive algorithms and hypergradient method: a performance and robustness study. Better Modeling of Incomplete Annotations for Named Entity Recognition. Hierarchical Self-Attention Network for Action Localization in Videos. Data-Driven Morphological Analysis for Uralic Languages. Critical Learning Periods in Deep Networks. SSF-DAN: Separ …

Combining adaptive algorithms and hypergradient method: a performance and robustness study


Title	Combining adaptive algorithms and hypergradient method: a performance and robustness study
Authors	Akram Erraqabi, Nicolas Le Roux
Abstract	Wilson et al. (2017) showed that, when the stepsize schedule is properly designed, stochastic gradient generalizes better than ADAM (Kingma & Ba, 2014). In light of recent work on hypergradient methods (Baydin et al., 2018), we revisit these claims to see if such methods close the gap between the most popular optimizers. As a byproduct, we analyze the true benefit of these hypergradient methods compared to more classical schedules, such as the fixed decay of Wilson et al. (2017). In particular, we observe they are of marginal help since their performance varies significantly when tuning their hyperparameters. Finally, as robustness is a critical quality of an optimizer, we provide a sensitivity analysis of these gradient based optimizers to assess how challenging their tuning is.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=rJgSV3AqKQ
PDF	https://openreview.net/pdf?id=rJgSV3AqKQ
PWC	https://paperswithcode.com/paper/combining-adaptive-algorithms-and
Repo
Framework

Better Modeling of Incomplete Annotations for Named Entity Recognition


Title	Better Modeling of Incomplete Annotations for Named Entity Recognition
Authors	Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li
Abstract	Supervised approaches to named entity recognition (NER) are largely developed based on the assumption that the training data is fully annotated with named entity information. However, in practice, annotated data can often be imperfect with one typical issue being the training data may contain incomplete annotations. We highlight several pitfalls associated with learning under such a setup in the context of NER and identify limitations associated with existing approaches, proposing a novel yet easy-to-implement approach for recognizing named entities with incomplete data annotations. We demonstrate the effectiveness of our approach through extensive experiments.
Tasks	Named Entity Recognition
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1079/
PDF	https://www.aclweb.org/anthology/N19-1079
PWC	https://paperswithcode.com/paper/better-modeling-of-incomplete-annotations-for
Repo
Framework

Hierarchical Self-Attention Network for Action Localization in Videos


Title	Hierarchical Self-Attention Network for Action Localization in Videos
Authors	Rizard Renanda Adhi Pramono, Yie-Tarng Chen, Wen-Hsien Fang
Abstract	This paper presents a novel Hierarchical Self-Attention Network (HISAN) to generate spatial-temporal tubes for action localization in videos. The essence of HISAN is to combine the two-stream convolutional neural network (CNN) with hierarchical bidirectional self-attention mechanism, which comprises of two levels of bidirectional self-attention to efficaciously capture both of the long-term temporal dependency information and spatial context information to render more precise action localization. Also, a sequence rescoring (SR) algorithm is employed to resolve the dilemma of inconsistent detection scores incurred by occlusion or background clutter. Moreover, a new fusion scheme is invoked, which integrates not only the appearance and motion information from the two-stream network, but also the motion saliency to mitigate the effect of camera motion. Simulations reveal that the new approach achieves competitive performance as the state-of-the-art works in terms of action localization and recognition accuracy on the widespread UCF101-24 and J-HMDB datasets.
Tasks	Action Localization
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Pramono_Hierarchical_Self-Attention_Network_for_Action_Localization_in_Videos_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Pramono_Hierarchical_Self-Attention_Network_for_Action_Localization_in_Videos_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/hierarchical-self-attention-network-for
Repo
Framework

Data-Driven Morphological Analysis for Uralic Languages


Title	Data-Driven Morphological Analysis for Uralic Languages
Authors	Miikka Silfverberg, Francis Tyers
Abstract
Tasks	Lemmatization, Morphological Analysis
Published	2019-01-01
URL	https://www.aclweb.org/anthology/W19-0301/
PDF	https://www.aclweb.org/anthology/W19-0301
PWC	https://paperswithcode.com/paper/data-driven-morphological-analysis-for-uralic
Repo
Framework

Critical Learning Periods in Deep Networks


Title	Critical Learning Periods in Deep Networks
Authors	Alessandro Achille, Matteo Rovere, Stefano Soatto
Abstract	Similar to humans and animals, deep artificial neural networks exhibit critical periods during which a temporary stimulus deficit can impair the development of a skill. The extent of the impairment depends on the onset and length of the deficit window, as in animal models, and on the size of the neural network. Deficits that do not affect low-level statistics, such as vertical flipping of the images, have no lasting effect on performance and can be overcome with further training. To better understand this phenomenon, we use the Fisher Information of the weights to measure the effective connectivity between layers of a network during training. Counterintuitively, information rises rapidly in the early phases of training, and then decreases, preventing redistribution of information resources in a phenomenon we refer to as a loss of “Information Plasticity”. Our analysis suggests that the first few epochs are critical for the creation of strong connections that are optimal relative to the input data distribution. Once such strong connections are created, they do not appear to change during additional training. These findings suggest that the initial learning transient, under-scrutinized compared to asymptotic behavior, plays a key role in determining the outcome of the training process. Our findings, combined with recent theoretical results in the literature, also suggest that forgetting (decrease of information in the weights) is critical to achieving invariance and disentanglement in representation learning. Finally, critical periods are not restricted to biological systems, but can emerge naturally in learning systems, whether biological or artificial, due to fundamental constrains arising from learning dynamics and information processing.
Tasks	Representation Learning
Published	2019-05-01
URL	https://openreview.net/forum?id=BkeStsCcKQ
PDF	https://openreview.net/pdf?id=BkeStsCcKQ
PWC	https://paperswithcode.com/paper/critical-learning-periods-in-deep-networks
Repo
Framework

SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation


Title	SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation
Authors	Liang Du, Jingang Tan, Hongye Yang, Jianfeng Feng, Xiangyang Xue, Qibao Zheng, Xiaoqing Ye, Xiaolin Zhang
Abstract	Despite the great success achieved by supervised fully convolutional models in semantic segmentation, training the models requires a large amount of labor-intensive work to generate pixel-level annotations. Recent works exploit synthetic data to train the model for semantic segmentation, but the domain adaptation between real and synthetic images remains a challenging problem. In this work, we propose a Separated Semantic Feature based domain adaptation network, named SSF-DAN, for semantic segmentation. First, a Semantic-wise Separable Discriminator (SS-D) is designed to independently adapt semantic features across the target and source domains, which addresses the inconsistent adaptation issue in the class-wise adversarial learning. In SS-D, a progressive confidence strategy is included to achieve a more reliable separation. Then, an efficient Class-wise Adversarial loss Reweighting module (CA-R) is introduced to balance the class-wise adversarial learning process, which leads the generator to focus more on poorly adapted classes. The presented framework demonstrates robust performance, superior to state-of-the-art methods on benchmark datasets.
Tasks	Domain Adaptation, Semantic Segmentation
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Du_SSF-DAN_Separated_Semantic_Feature_Based_Domain_Adaptation_Network_for_Semantic_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Du_SSF-DAN_Separated_Semantic_Feature_Based_Domain_Adaptation_Network_for_Semantic_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/ssf-dan-separated-semantic-feature-based
Repo
Framework

BLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation


Title	BLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation
Authors	Ruoyao Yang, Wanying Xie, Chunhua Liu, Dong Yu
Abstract	Researchers have been paying increasing attention to rumour evaluation due to the rapid spread of unsubstantiated rumours on social media platforms, including SemEval 2019 task 7. However, labelled data for learning rumour veracity is scarce, and labels in rumour stance data are highly disproportionate, making it challenging for a model to perform supervised-learning adequately. We propose an inference chain-based system, which fully utilizes conversation structure-based knowledge in the limited data and expand the training data in minority categories to alleviate class imbalance. Our approach obtains 12.6{%} improvement upon the baseline system for subtask A, ranks 1st among 21 systems in subtask A, and ranks 4th among 12 systems in subtask B.
Tasks	Rumour Detection
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2191/
PDF	https://www.aclweb.org/anthology/S19-2191
PWC	https://paperswithcode.com/paper/blcu_nlp-at-semeval-2019-task-7-an-inference
Repo
Framework

DON’T JUDGE A BOOK BY ITS COVER - ON THE DYNAMICS OF RECURRENT NEURAL NETWORKS


Title	DON’T JUDGE A BOOK BY ITS COVER - ON THE DYNAMICS OF RECURRENT NEURAL NETWORKS
Authors	Doron Haviv, Alexander Rivkind, Omri Barak
Abstract	To be effective in sequential data processing, Recurrent Neural Networks (RNNs) are required to keep track of past events by creating memories. Consequently RNNs are harder to train than their feedforward counterparts, prompting the developments of both dedicated units such as LSTM and GRU and of a handful of training tricks. In this paper, we investigate the effect of different training protocols on the representation of memories in RNN. While reaching similar performance for different protocols, RNNs are shown to exhibit substantial differences in their ability to generalize for unforeseen tasks or conditions. We analyze the dynamics of the network’s hidden state, and uncover the reasons for this difference. Each memory is found to be associated with a nearly steady state of the dynamics whose speed predicts performance on unforeseen tasks and which we refer to as a ’slow point’. By tracing the formation of the slow points we are able to understand the origin of differences between training protocols. Our results show that multiple solutions to the same task exist but may rely on different dynamical mechanisms, and that training protocols can bias the choice of such solutions in an interpretable way.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=H1z_Z2A5tX
PDF	https://openreview.net/pdf?id=H1z_Z2A5tX
PWC	https://paperswithcode.com/paper/dont-judge-a-book-by-its-cover-on-the
Repo
Framework

Columbia at SemEval-2019 Task 7: Multi-task Learning for Stance Classification and Rumour Verification


Title	Columbia at SemEval-2019 Task 7: Multi-task Learning for Stance Classification and Rumour Verification
Authors	Zhuoran Liu, Shivali Goel, Mukund Yelahanka Raghuprasad, Smar Muresan, a
Abstract	The paper presents Columbia team{'}s participation in the SemEval 2019 Shared Task 7: RumourEval 2019. Detecting rumour on social networks has been a focus of research in recent years. Previous work suffered from data sparsity, which potentially limited the application of more sophisticated neural architecture to this task. We mitigate this problem by proposing a multi-task learning approach together with language model fine-tuning. Our attention-based model allows different tasks to leverage different level of information. Our system ranked 6th overall with an F1-score of 36.25 on stance classification and F1 of 22.44 on rumour verification.
Tasks	Language Modelling, Multi-Task Learning
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2194/
PDF	https://www.aclweb.org/anthology/S19-2194
PWC	https://paperswithcode.com/paper/columbia-at-semeval-2019-task-7-multi-task
Repo
Framework

Neural Network Prediction of Censorable Language


Title	Neural Network Prediction of Censorable Language
Authors	Kei Yin Ng, Anna Feldman, Jing Peng, Chris Leberknight
Abstract	Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House{'}s annual Freedom on the Net report, more than half the world{'}s Internet users now live in a place where the Internet is censored or restricted. China has built the world{'}s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention {`}sensitive{'} topics or authored by {`}sensitive{'} users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50{%} accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-2105/
PDF	https://www.aclweb.org/anthology/W19-2105
PWC	https://paperswithcode.com/paper/neural-network-prediction-of-censorable
Repo
Framework

SINAI-DL at SemEval-2019 Task 7: Data Augmentation and Temporal Expressions


Title	SINAI-DL at SemEval-2019 Task 7: Data Augmentation and Temporal Expressions
Authors	Miguel A. Garc{'\i}a-Cumbreras, Salud Mar{'\i}a Jim{'e}nez-Zafra, Arturo Montejo-R{'a}ez, Manuel Carlos D{'\i}az-Galiano, Estela Saquete
Abstract	This paper describes the participation of the SINAI-DL team at RumourEval (Task 7 in SemEval 2019, subtask A: SDQC). SDQC addresses the challenge of rumour stance classification as an indirect way of identifying potential rumours. Given a tweet with several replies, our system classifies each reply into either supporting, denying, questioning or commenting on the underlying rumours. We have applied data augmentation, temporal expressions labelling and transfer learning with a four-layer neural classifier. We achieve an accuracy of 0.715 with the official run over reply tweets.
Tasks	Data Augmentation, Rumour Detection, Transfer Learning
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2196/
PDF	https://www.aclweb.org/anthology/S19-2196
PWC	https://paperswithcode.com/paper/sinai-dl-at-semeval-2019-task-7-data
Repo
Framework

UPV-28-UNITO at SemEval-2019 Task 7: Exploiting Post’s Nesting and Syntax Information for Rumor Stance Classification


Title	UPV-28-UNITO at SemEval-2019 Task 7: Exploiting Post’s Nesting and Syntax Information for Rumor Stance Classification
Authors	Bilal Ghanem, Aless Cignarella, ra Teresa, Cristina Bosco, Paolo Rosso, Francisco Manuel Rangel Pardo
Abstract	In the present paper we describe the UPV-28-UNITO system{'}s submission to the RumorEval 2019 shared task. The approach we applied for addressing both the subtasks of the contest exploits both classical machine learning algorithms and word embeddings, and it is based on diverse groups of features: stylistic, lexical, emotional, sentiment, meta-structural and Twitter-based. A novel set of features that take advantage of the syntactic information in texts is moreover introduced in the paper.
Tasks	Word Embeddings
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2197/
PDF	https://www.aclweb.org/anthology/S19-2197
PWC	https://paperswithcode.com/paper/upv-28-unito-at-semeval-2019-task-7
Repo
Framework

Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation


Title	Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation
Authors	Jian Liang, Ran He, Zhenan Sun, Tieniu Tan
Abstract	Conventional domain adaptation methods usually resort to deep neural networks or subspace learning to find invariant representations across domains. However, most deep learning methods highly rely on large-size source domains and are computationally expensive to train, while subspace learning methods always have a quadratic time complexity that suffers from the large domain size. This paper provides a simple and efficient solution, which could be regarded as a well-performing baseline for domain adaptation tasks. Our method is built upon the nearest centroid classifier, seeking a subspace where the centroids in the target domain are moderately shifted from those in the source domain. Specifically, we design a unified objective without accessing the source domain data and adopt an alternating minimization scheme to iteratively discover the pseudo target labels, invariant subspace, and target centroids. Besides its privacy-preserving property (distant supervision), the algorithm is provably convergent and has a promising linear time complexity. In addition, the proposed method can be readily extended to multi-source setting and domain generalization, and it remarkably enhances popular deep adaptation methods by borrowing the learned transferable features. Extensive experiments on several benchmarks including object, digit, and face recognition datasets validate that our methods yield state-of-the-art results in various domain adaptation tasks.
Tasks	Domain Adaptation, Domain Generalization, Face Recognition
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Liang_Distant_Supervised_Centroid_Shift_A_Simple_and_Efficient_Approach_to_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Liang_Distant_Supervised_Centroid_Shift_A_Simple_and_Efficient_Approach_to_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/distant-supervised-centroid-shift-a-simple
Repo
Framework

CodeForTheChange at SemEval-2019 Task 8: Skip-Thoughts for Fact Checking in Community Question Answering


Title	CodeForTheChange at SemEval-2019 Task 8: Skip-Thoughts for Fact Checking in Community Question Answering
Authors	Adithya Avvaru, P, Anupam ey
Abstract	The strengths of the scalable gradient tree boosting algorithm, XGBoost and distributed sentence encoder, Skip-Thought Vectors are not explored yet by the cQA research community. We tried to apply and combine these two effective methods for finding factual nature of the questions and answers. The work also include experimentation with other popular classifier models like AdaBoost Classifier, DecisionTree Classifier, RandomForest Classifier, ExtraTrees Classifier, XGBoost Classifier and Multi-layer Neural Network. In this paper, we present the features used, approaches followed for feature engineering, models experimented with and finally the results.
Tasks	Community Question Answering, Feature Engineering, Question Answering
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2199/
PDF	https://www.aclweb.org/anthology/S19-2199
PWC	https://paperswithcode.com/paper/codeforthechange-at-semeval-2019-task-8-skip
Repo
Framework

ColumbiaNLP at SemEval-2019 Task 8: The Answer is Language Model Fine-tuning


Title	ColumbiaNLP at SemEval-2019 Task 8: The Answer is Language Model Fine-tuning
Authors	Tuhin Chakrabarty, Smar Muresan, a
Abstract	Community Question Answering forums are very popular nowadays, as they represent effective means for communities to share information around particular topics. But the information shared on these forums are often not authentic. This paper presents the ColumbiaNLP submission for the SemEval-2019 Task 8: Fact-Checking in Community Question Answering Forums. We show how fine-tuning a language model on a large unannotated corpus of old threads from Qatar Living forum helps us to classify question types (factual, opinion, socializing) and to judge the factuality of answers on the shared task labeled data from the same forum. Our system finished 4th and 2nd on Subtask A (question type classification) and B (answer factuality prediction), respectively, based on the official metric of accuracy.
Tasks	Community Question Answering, Language Modelling, Question Answering
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2200/
PDF	https://www.aclweb.org/anthology/S19-2200
PWC	https://paperswithcode.com/paper/columbianlp-at-semeval-2019-task-8-the-answer
Repo
Framework