Paper Group NANR 259
Combining adaptive algorithms and hypergradient method: a performance and robustness study. Better Modeling of Incomplete Annotations for Named Entity Recognition. Hierarchical Self-Attention Network for Action Localization in Videos. Data-Driven Morphological Analysis for Uralic Languages. Critical Learning Periods in Deep Networks. SSF-DAN: Separ …
Combining adaptive algorithms and hypergradient method: a performance and robustness study
Title | Combining adaptive algorithms and hypergradient method: a performance and robustness study |
Authors | Akram Erraqabi, Nicolas Le Roux |
Abstract | Wilson et al. (2017) showed that, when the stepsize schedule is properly designed, stochastic gradient generalizes better than ADAM (Kingma & Ba, 2014). In light of recent work on hypergradient methods (Baydin et al., 2018), we revisit these claims to see if such methods close the gap between the most popular optimizers. As a byproduct, we analyze the true benefit of these hypergradient methods compared to more classical schedules, such as the fixed decay of Wilson et al. (2017). In particular, we observe they are of marginal help since their performance varies significantly when tuning their hyperparameters. Finally, as robustness is a critical quality of an optimizer, we provide a sensitivity analysis of these gradient based optimizers to assess how challenging their tuning is. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=rJgSV3AqKQ |
https://openreview.net/pdf?id=rJgSV3AqKQ | |
PWC | https://paperswithcode.com/paper/combining-adaptive-algorithms-and |
Repo | |
Framework | |
Better Modeling of Incomplete Annotations for Named Entity Recognition
Title | Better Modeling of Incomplete Annotations for Named Entity Recognition |
Authors | Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li |
Abstract | Supervised approaches to named entity recognition (NER) are largely developed based on the assumption that the training data is fully annotated with named entity information. However, in practice, annotated data can often be imperfect with one typical issue being the training data may contain incomplete annotations. We highlight several pitfalls associated with learning under such a setup in the context of NER and identify limitations associated with existing approaches, proposing a novel yet easy-to-implement approach for recognizing named entities with incomplete data annotations. We demonstrate the effectiveness of our approach through extensive experiments. |
Tasks | Named Entity Recognition |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1079/ |
https://www.aclweb.org/anthology/N19-1079 | |
PWC | https://paperswithcode.com/paper/better-modeling-of-incomplete-annotations-for |
Repo | |
Framework | |
Hierarchical Self-Attention Network for Action Localization in Videos
Title | Hierarchical Self-Attention Network for Action Localization in Videos |
Authors | Rizard Renanda Adhi Pramono, Yie-Tarng Chen, Wen-Hsien Fang |
Abstract | This paper presents a novel Hierarchical Self-Attention Network (HISAN) to generate spatial-temporal tubes for action localization in videos. The essence of HISAN is to combine the two-stream convolutional neural network (CNN) with hierarchical bidirectional self-attention mechanism, which comprises of two levels of bidirectional self-attention to efficaciously capture both of the long-term temporal dependency information and spatial context information to render more precise action localization. Also, a sequence rescoring (SR) algorithm is employed to resolve the dilemma of inconsistent detection scores incurred by occlusion or background clutter. Moreover, a new fusion scheme is invoked, which integrates not only the appearance and motion information from the two-stream network, but also the motion saliency to mitigate the effect of camera motion. Simulations reveal that the new approach achieves competitive performance as the state-of-the-art works in terms of action localization and recognition accuracy on the widespread UCF101-24 and J-HMDB datasets. |
Tasks | Action Localization |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Pramono_Hierarchical_Self-Attention_Network_for_Action_Localization_in_Videos_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Pramono_Hierarchical_Self-Attention_Network_for_Action_Localization_in_Videos_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-self-attention-network-for |
Repo | |
Framework | |
Data-Driven Morphological Analysis for Uralic Languages
Title | Data-Driven Morphological Analysis for Uralic Languages |
Authors | Miikka Silfverberg, Francis Tyers |
Abstract | |
Tasks | Lemmatization, Morphological Analysis |
Published | 2019-01-01 |
URL | https://www.aclweb.org/anthology/W19-0301/ |
https://www.aclweb.org/anthology/W19-0301 | |
PWC | https://paperswithcode.com/paper/data-driven-morphological-analysis-for-uralic |
Repo | |
Framework | |
Critical Learning Periods in Deep Networks
Title | Critical Learning Periods in Deep Networks |
Authors | Alessandro Achille, Matteo Rovere, Stefano Soatto |
Abstract | Similar to humans and animals, deep artificial neural networks exhibit critical periods during which a temporary stimulus deficit can impair the development of a skill. The extent of the impairment depends on the onset and length of the deficit window, as in animal models, and on the size of the neural network. Deficits that do not affect low-level statistics, such as vertical flipping of the images, have no lasting effect on performance and can be overcome with further training. To better understand this phenomenon, we use the Fisher Information of the weights to measure the effective connectivity between layers of a network during training. Counterintuitively, information rises rapidly in the early phases of training, and then decreases, preventing redistribution of information resources in a phenomenon we refer to as a loss of “Information Plasticity”. Our analysis suggests that the first few epochs are critical for the creation of strong connections that are optimal relative to the input data distribution. Once such strong connections are created, they do not appear to change during additional training. These findings suggest that the initial learning transient, under-scrutinized compared to asymptotic behavior, plays a key role in determining the outcome of the training process. Our findings, combined with recent theoretical results in the literature, also suggest that forgetting (decrease of information in the weights) is critical to achieving invariance and disentanglement in representation learning. Finally, critical periods are not restricted to biological systems, but can emerge naturally in learning systems, whether biological or artificial, due to fundamental constrains arising from learning dynamics and information processing. |
Tasks | Representation Learning |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=BkeStsCcKQ |
https://openreview.net/pdf?id=BkeStsCcKQ | |
PWC | https://paperswithcode.com/paper/critical-learning-periods-in-deep-networks |
Repo | |
Framework | |
SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation
Title | SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation |
Authors | Liang Du, Jingang Tan, Hongye Yang, Jianfeng Feng, Xiangyang Xue, Qibao Zheng, Xiaoqing Ye, Xiaolin Zhang |
Abstract | Despite the great success achieved by supervised fully convolutional models in semantic segmentation, training the models requires a large amount of labor-intensive work to generate pixel-level annotations. Recent works exploit synthetic data to train the model for semantic segmentation, but the domain adaptation between real and synthetic images remains a challenging problem. In this work, we propose a Separated Semantic Feature based domain adaptation network, named SSF-DAN, for semantic segmentation. First, a Semantic-wise Separable Discriminator (SS-D) is designed to independently adapt semantic features across the target and source domains, which addresses the inconsistent adaptation issue in the class-wise adversarial learning. In SS-D, a progressive confidence strategy is included to achieve a more reliable separation. Then, an efficient Class-wise Adversarial loss Reweighting module (CA-R) is introduced to balance the class-wise adversarial learning process, which leads the generator to focus more on poorly adapted classes. The presented framework demonstrates robust performance, superior to state-of-the-art methods on benchmark datasets. |
Tasks | Domain Adaptation, Semantic Segmentation |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Du_SSF-DAN_Separated_Semantic_Feature_Based_Domain_Adaptation_Network_for_Semantic_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Du_SSF-DAN_Separated_Semantic_Feature_Based_Domain_Adaptation_Network_for_Semantic_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/ssf-dan-separated-semantic-feature-based |
Repo | |
Framework | |
BLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation
Title | BLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation |
Authors | Ruoyao Yang, Wanying Xie, Chunhua Liu, Dong Yu |
Abstract | Researchers have been paying increasing attention to rumour evaluation due to the rapid spread of unsubstantiated rumours on social media platforms, including SemEval 2019 task 7. However, labelled data for learning rumour veracity is scarce, and labels in rumour stance data are highly disproportionate, making it challenging for a model to perform supervised-learning adequately. We propose an inference chain-based system, which fully utilizes conversation structure-based knowledge in the limited data and expand the training data in minority categories to alleviate class imbalance. Our approach obtains 12.6{%} improvement upon the baseline system for subtask A, ranks 1st among 21 systems in subtask A, and ranks 4th among 12 systems in subtask B. |
Tasks | Rumour Detection |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2191/ |
https://www.aclweb.org/anthology/S19-2191 | |
PWC | https://paperswithcode.com/paper/blcu_nlp-at-semeval-2019-task-7-an-inference |
Repo | |
Framework | |
DON’T JUDGE A BOOK BY ITS COVER - ON THE DYNAMICS OF RECURRENT NEURAL NETWORKS
Title | DON’T JUDGE A BOOK BY ITS COVER - ON THE DYNAMICS OF RECURRENT NEURAL NETWORKS |
Authors | Doron Haviv, Alexander Rivkind, Omri Barak |
Abstract | To be effective in sequential data processing, Recurrent Neural Networks (RNNs) are required to keep track of past events by creating memories. Consequently RNNs are harder to train than their feedforward counterparts, prompting the developments of both dedicated units such as LSTM and GRU and of a handful of training tricks. In this paper, we investigate the effect of different training protocols on the representation of memories in RNN. While reaching similar performance for different protocols, RNNs are shown to exhibit substantial differences in their ability to generalize for unforeseen tasks or conditions. We analyze the dynamics of the network’s hidden state, and uncover the reasons for this difference. Each memory is found to be associated with a nearly steady state of the dynamics whose speed predicts performance on unforeseen tasks and which we refer to as a ’slow point’. By tracing the formation of the slow points we are able to understand the origin of differences between training protocols. Our results show that multiple solutions to the same task exist but may rely on different dynamical mechanisms, and that training protocols can bias the choice of such solutions in an interpretable way. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=H1z_Z2A5tX |
https://openreview.net/pdf?id=H1z_Z2A5tX | |
PWC | https://paperswithcode.com/paper/dont-judge-a-book-by-its-cover-on-the |
Repo | |
Framework | |
Columbia at SemEval-2019 Task 7: Multi-task Learning for Stance Classification and Rumour Verification
Title | Columbia at SemEval-2019 Task 7: Multi-task Learning for Stance Classification and Rumour Verification |
Authors | Zhuoran Liu, Shivali Goel, Mukund Yelahanka Raghuprasad, Smar Muresan, a |
Abstract | The paper presents Columbia team{'}s participation in the SemEval 2019 Shared Task 7: RumourEval 2019. Detecting rumour on social networks has been a focus of research in recent years. Previous work suffered from data sparsity, which potentially limited the application of more sophisticated neural architecture to this task. We mitigate this problem by proposing a multi-task learning approach together with language model fine-tuning. Our attention-based model allows different tasks to leverage different level of information. Our system ranked 6th overall with an F1-score of 36.25 on stance classification and F1 of 22.44 on rumour verification. |
Tasks | Language Modelling, Multi-Task Learning |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2194/ |
https://www.aclweb.org/anthology/S19-2194 | |
PWC | https://paperswithcode.com/paper/columbia-at-semeval-2019-task-7-multi-task |
Repo | |
Framework | |
Neural Network Prediction of Censorable Language
Title | Neural Network Prediction of Censorable Language |
Authors | Kei Yin Ng, Anna Feldman, Jing Peng, Chris Leberknight |
Abstract | Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House{'}s annual Freedom on the Net report, more than half the world{'}s Internet users now live in a place where the Internet is censored or restricted. China has built the world{'}s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention {}sensitive{'} topics or authored by { }sensitive{'} users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50{%} accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/W19-2105/ |
https://www.aclweb.org/anthology/W19-2105 | |
PWC | https://paperswithcode.com/paper/neural-network-prediction-of-censorable |
Repo | |
Framework | |
SINAI-DL at SemEval-2019 Task 7: Data Augmentation and Temporal Expressions
Title | SINAI-DL at SemEval-2019 Task 7: Data Augmentation and Temporal Expressions |
Authors | Miguel A. Garc{'\i}a-Cumbreras, Salud Mar{'\i}a Jim{'e}nez-Zafra, Arturo Montejo-R{'a}ez, Manuel Carlos D{'\i}az-Galiano, Estela Saquete |
Abstract | This paper describes the participation of the SINAI-DL team at RumourEval (Task 7 in SemEval 2019, subtask A: SDQC). SDQC addresses the challenge of rumour stance classification as an indirect way of identifying potential rumours. Given a tweet with several replies, our system classifies each reply into either supporting, denying, questioning or commenting on the underlying rumours. We have applied data augmentation, temporal expressions labelling and transfer learning with a four-layer neural classifier. We achieve an accuracy of 0.715 with the official run over reply tweets. |
Tasks | Data Augmentation, Rumour Detection, Transfer Learning |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2196/ |
https://www.aclweb.org/anthology/S19-2196 | |
PWC | https://paperswithcode.com/paper/sinai-dl-at-semeval-2019-task-7-data |
Repo | |
Framework | |
UPV-28-UNITO at SemEval-2019 Task 7: Exploiting Post’s Nesting and Syntax Information for Rumor Stance Classification
Title | UPV-28-UNITO at SemEval-2019 Task 7: Exploiting Post’s Nesting and Syntax Information for Rumor Stance Classification |
Authors | Bilal Ghanem, Aless Cignarella, ra Teresa, Cristina Bosco, Paolo Rosso, Francisco Manuel Rangel Pardo |
Abstract | In the present paper we describe the UPV-28-UNITO system{'}s submission to the RumorEval 2019 shared task. The approach we applied for addressing both the subtasks of the contest exploits both classical machine learning algorithms and word embeddings, and it is based on diverse groups of features: stylistic, lexical, emotional, sentiment, meta-structural and Twitter-based. A novel set of features that take advantage of the syntactic information in texts is moreover introduced in the paper. |
Tasks | Word Embeddings |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2197/ |
https://www.aclweb.org/anthology/S19-2197 | |
PWC | https://paperswithcode.com/paper/upv-28-unito-at-semeval-2019-task-7 |
Repo | |
Framework | |
Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation
Title | Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation |
Authors | Jian Liang, Ran He, Zhenan Sun, Tieniu Tan |
Abstract | Conventional domain adaptation methods usually resort to deep neural networks or subspace learning to find invariant representations across domains. However, most deep learning methods highly rely on large-size source domains and are computationally expensive to train, while subspace learning methods always have a quadratic time complexity that suffers from the large domain size. This paper provides a simple and efficient solution, which could be regarded as a well-performing baseline for domain adaptation tasks. Our method is built upon the nearest centroid classifier, seeking a subspace where the centroids in the target domain are moderately shifted from those in the source domain. Specifically, we design a unified objective without accessing the source domain data and adopt an alternating minimization scheme to iteratively discover the pseudo target labels, invariant subspace, and target centroids. Besides its privacy-preserving property (distant supervision), the algorithm is provably convergent and has a promising linear time complexity. In addition, the proposed method can be readily extended to multi-source setting and domain generalization, and it remarkably enhances popular deep adaptation methods by borrowing the learned transferable features. Extensive experiments on several benchmarks including object, digit, and face recognition datasets validate that our methods yield state-of-the-art results in various domain adaptation tasks. |
Tasks | Domain Adaptation, Domain Generalization, Face Recognition |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Liang_Distant_Supervised_Centroid_Shift_A_Simple_and_Efficient_Approach_to_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Liang_Distant_Supervised_Centroid_Shift_A_Simple_and_Efficient_Approach_to_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/distant-supervised-centroid-shift-a-simple |
Repo | |
Framework | |
CodeForTheChange at SemEval-2019 Task 8: Skip-Thoughts for Fact Checking in Community Question Answering
Title | CodeForTheChange at SemEval-2019 Task 8: Skip-Thoughts for Fact Checking in Community Question Answering |
Authors | Adithya Avvaru, P, Anupam ey |
Abstract | The strengths of the scalable gradient tree boosting algorithm, XGBoost and distributed sentence encoder, Skip-Thought Vectors are not explored yet by the cQA research community. We tried to apply and combine these two effective methods for finding factual nature of the questions and answers. The work also include experimentation with other popular classifier models like AdaBoost Classifier, DecisionTree Classifier, RandomForest Classifier, ExtraTrees Classifier, XGBoost Classifier and Multi-layer Neural Network. In this paper, we present the features used, approaches followed for feature engineering, models experimented with and finally the results. |
Tasks | Community Question Answering, Feature Engineering, Question Answering |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2199/ |
https://www.aclweb.org/anthology/S19-2199 | |
PWC | https://paperswithcode.com/paper/codeforthechange-at-semeval-2019-task-8-skip |
Repo | |
Framework | |
ColumbiaNLP at SemEval-2019 Task 8: The Answer is Language Model Fine-tuning
Title | ColumbiaNLP at SemEval-2019 Task 8: The Answer is Language Model Fine-tuning |
Authors | Tuhin Chakrabarty, Smar Muresan, a |
Abstract | Community Question Answering forums are very popular nowadays, as they represent effective means for communities to share information around particular topics. But the information shared on these forums are often not authentic. This paper presents the ColumbiaNLP submission for the SemEval-2019 Task 8: Fact-Checking in Community Question Answering Forums. We show how fine-tuning a language model on a large unannotated corpus of old threads from Qatar Living forum helps us to classify question types (factual, opinion, socializing) and to judge the factuality of answers on the shared task labeled data from the same forum. Our system finished 4th and 2nd on Subtask A (question type classification) and B (answer factuality prediction), respectively, based on the official metric of accuracy. |
Tasks | Community Question Answering, Language Modelling, Question Answering |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2200/ |
https://www.aclweb.org/anthology/S19-2200 | |
PWC | https://paperswithcode.com/paper/columbianlp-at-semeval-2019-task-8-the-answer |
Repo | |
Framework | |