Paper Group NANR 247
DEEP-TRIM: REVISITING L1 REGULARIZATION FOR CONNECTION PRUNING OF DEEP NETWORK. Presenting TWITTIR`O-UD: An Italian Twitter Treebank in Universal Dependencies. NLP at SemEval-2019 Task 6: Detecting Offensive language using Neural Networks. Explainable Artificial Intelligence and its potential within Industry. Action Assessment by Joint Relation Gr …
DEEP-TRIM: REVISITING L1 REGULARIZATION FOR CONNECTION PRUNING OF DEEP NETWORK
Title | DEEP-TRIM: REVISITING L1 REGULARIZATION FOR CONNECTION PRUNING OF DEEP NETWORK |
Authors | Chih-Kuan Yeh, Ian E.H. Yen, Hong-You Chen, Chun-Pei Yang, Shou-De Lin, Pradeep Ravikumar |
Abstract | State-of-the-art deep neural networks (DNNs) typically have tens of millions of parameters, which might not fit into the upper levels of the memory hierarchy, thus increasing the inference time and energy consumption significantly, and prohibiting their use on edge devices such as mobile phones. The compression of DNN models has therefore become an active area of research recently, with \emph{connection pruning} emerging as one of the most successful strategies. A very natural approach is to prune connections of DNNs via $\ell_1$ regularization, but recent empirical investigations have suggested that this does not work as well in the context of DNN compression. In this work, we revisit this simple strategy and analyze it rigorously, to show that: (a) any \emph{stationary point} of an $\ell_1$-regularized layerwise-pruning objective has its number of non-zero elements bounded by the number of penalized prediction logits, regardless of the strength of the regularization; (b) successful pruning highly relies on an accurate optimization solver, and there is a trade-off between compression speed and distortion of prediction accuracy, controlled by the strength of regularization. Our theoretical results thus suggest that $\ell_1$ pruning could be successful provided we use an accurate optimization solver. We corroborate this in our experiments, where we show that simple $\ell_1$ regularization with an Adamax-L1(cumulative) solver gives pruning ratio competitive to the state-of-the-art. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=r1exVhActQ |
https://openreview.net/pdf?id=r1exVhActQ | |
PWC | https://paperswithcode.com/paper/deep-trim-revisiting-l1-regularization-for |
Repo | |
Framework | |
Presenting TWITTIR`O-UD: An Italian Twitter Treebank in Universal Dependencies
Title | Presenting TWITTIR`O-UD: An Italian Twitter Treebank in Universal Dependencies |
Authors | Aless Cignarella, ra Teresa, Cristina Bosco, Paolo Rosso |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7723/ |
https://www.aclweb.org/anthology/W19-7723 | |
PWC | https://paperswithcode.com/paper/presenting-twittiro-ud-an-italian-twitter |
Repo | |
Framework | |
NLP at SemEval-2019 Task 6: Detecting Offensive language using Neural Networks
Title | NLP at SemEval-2019 Task 6: Detecting Offensive language using Neural Networks |
Authors | Prashant Kapil, Asif Ekbal, Dipankar Das |
Abstract | In this paper we built several deep learning architectures to participate in shared task OffensEval: Identifying and categorizing Offensive language in Social media by semEval-2019. The dataset was annotated with three level annotation schemes and task was to detect between offensive and not offensive, categorization and target identification in offensive contents. Deep learning models with POS information as feature were also leveraged for classification. The three best models that performed best on individual sub tasks are stacking of CNN-Bi-LSTM with Attention, BiLSTM with POS information added with word features and Bi-LSTM for third task. Our models achieved a Macro F1 score of 0.7594, 0.5378 and 0.4588 in Task(A,B,C) respectively with rank of 33rd, 54th and 52nd out of 103, 75 and 65 submissions.The three best models that performed best on individual sub task are using Neural Networks. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2105/ |
https://www.aclweb.org/anthology/S19-2105 | |
PWC | https://paperswithcode.com/paper/nlp-at-semeval-2019-task-6-detecting |
Repo | |
Framework | |
Explainable Artificial Intelligence and its potential within Industry
Title | Explainable Artificial Intelligence and its potential within Industry |
Authors | Saad Mahamood |
Abstract | |
Tasks | |
Published | 2019-01-01 |
URL | https://www.aclweb.org/anthology/W19-8401/ |
https://www.aclweb.org/anthology/W19-8401 | |
PWC | https://paperswithcode.com/paper/explainable-artificial-intelligence-and-its |
Repo | |
Framework | |
Action Assessment by Joint Relation Graphs
Title | Action Assessment by Joint Relation Graphs |
Authors | Jia-Hui Pan, Jibin Gao, Wei-Shi Zheng |
Abstract | We present a new model to assess the performance of actions from videos, through graph-based joint relation modelling. Previous works mainly focused on the whole scene including the performer’s body and background, yet they ignored the detailed joint interactions. This is insufficient for fine-grained, accurate action assessment, because the action quality of each joint is dependent of its neighbouring joints. Therefore, we propose to learn the detailed joint motion based on the joint relations. We build trainable Joint Relation Graphs, and analyze joint motion on them. We propose two novel modules, the Joint Commonality Module and the Joint Difference Module, for joint motion learning. The Joint Commonality Module models the general motion for certain body parts, and the Joint Difference Module models the motion differences within body parts. We evaluate our method on six public Olympic actions for performance assessment. Our method outperforms previous approaches (+0.0912) and the whole-scene analysis (+0.0623) in the Spearman’s Rank Correlation. We also demonstrate our model’s ability to interpret the action assessment process. |
Tasks | |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Pan_Action_Assessment_by_Joint_Relation_Graphs_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Pan_Action_Assessment_by_Joint_Relation_Graphs_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/action-assessment-by-joint-relation-graphs |
Repo | |
Framework | |
A Progressive Model to Enable Continual Learning for Semantic Slot Filling
Title | A Progressive Model to Enable Continual Learning for Semantic Slot Filling |
Authors | Yilin Shen, Xiangyu Zeng, Hongxia Jin |
Abstract | Semantic slot filling is one of the major tasks in spoken language understanding (SLU). After a slot filling model is trained on precollected data, it is crucial to continually improve the model after deployment to learn users{'} new expressions. As the data amount grows, it becomes infeasible to either store such huge data and repeatedly retrain the model on all data or fine tune the model only on new data without forgetting old expressions. In this paper, we introduce a novel progressive slot filling model, ProgModel. ProgModel consists of a novel context gate that transfers previously learned knowledge to a small size expanded component; and meanwhile enables this new component to be fast trained to learn from new data. As such, ProgModel learns the new knowledge by only using new data at each time and meanwhile preserves the previously learned expressions. Our experiments show that ProgModel needs much less training time and smaller model size to outperform various model fine tuning competitors by up to 4.24{%} and 3.03{%} on two benchmark datasets. |
Tasks | Continual Learning, Slot Filling, Spoken Language Understanding |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1126/ |
https://www.aclweb.org/anthology/D19-1126 | |
PWC | https://paperswithcode.com/paper/a-progressive-model-to-enable-continual |
Repo | |
Framework | |
Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering
Title | Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering |
Authors | J. Edward Hu, Abhinav Singh, Nils Holzenberger, Matt Post, Benjamin Van Durme |
Abstract | Producing diverse paraphrases of a sentence is a challenging task. Natural paraphrase corpora are scarce and limited, while existing large-scale resources are automatically generated via back-translation and rely on beam search, which tends to lack diversity. We describe ParaBank 2, a new resource that contains multiple diverse sentential paraphrases, produced from a bilingual corpus using negative constraints, inference sampling, and clustering.We show that ParaBank 2 significantly surpasses prior work in both lexical and syntactic diversity while being meaning-preserving, as measured by human judgments and standardized metrics. Further, we illustrate how such paraphrastic resources may be used to refine contextualized encoders, leading to improvements in downstream tasks. |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/K19-1005/ |
https://www.aclweb.org/anthology/K19-1005 | |
PWC | https://paperswithcode.com/paper/large-scale-diverse-paraphrastic-bitexts-via |
Repo | |
Framework | |
Efficient Deep Approximation of GMMs
Title | Efficient Deep Approximation of GMMs |
Authors | Shirin Jalali, Carl Nuzman, Iraj Saniee |
Abstract | The universal approximation theorem states that any regular function can be approximated closely using a single hidden layer neural network. Some recent work has shown that, for some special functions, the number of nodes in such an approximation could be exponentially reduced with multi-layer neural networks. In this work, we extend this idea to a rich class of functions, namely the discriminant functions that arise in optimal Bayesian classification of Gaussian mixture models (GMMs) in $\mathds{R}^n$. We show that such functions can be approximated with arbitrary precision using $O(n)$ nodes in a neural network with two hidden layers (deep neural network), while in contrast, a neural network with a single hidden layer (shallow neural network) would require at least $O(\exp(n))$ nodes or exponentially large coefficients. Given the universality of the Gaussian distribution in the feature spaces of data, e.g., in speech, image and text, our results shed light on the observed efficiency of deep neural networks in practical classification problems. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8704-efficient-deep-approximation-of-gmms |
http://papers.nips.cc/paper/8704-efficient-deep-approximation-of-gmms.pdf | |
PWC | https://paperswithcode.com/paper/efficient-deep-approximation-of-gmms |
Repo | |
Framework | |
Aggregating Bidirectional Encoder Representations Using MatchLSTM for Sequence Matching
Title | Aggregating Bidirectional Encoder Representations Using MatchLSTM for Sequence Matching |
Authors | Bo Shao, Yeyun Gong, Weizhen Qi, Nan Duan, Xiaola Lin |
Abstract | In this work, we propose an aggregation method to combine the Bidirectional Encoder Representations from Transformer (BERT) with a MatchLSTM layer for Sequence Matching. Given a sentence pair, we extract the output representations of it from BERT. Then we extend BERT with a MatchLSTM layer to get further interaction of the sentence pair for sequence matching tasks. Taking natural language inference as an example, we split BERT output into two parts, which is from premise sentence and hypothesis sentence. At each position of the hypothesis sentence, both the weighted representation of the premise sentence and the representation of the current token are fed into LSTM. We jointly train the aggregation layer and pre-trained layer for sequence matching. We conduct an experiment on two publicly available datasets, WikiQA and SNLI. Experiments show that our model achieves significantly improvement compared with state-of-the-art methods on both datasets. |
Tasks | Natural Language Inference |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1626/ |
https://www.aclweb.org/anthology/D19-1626 | |
PWC | https://paperswithcode.com/paper/aggregating-bidirectional-encoder |
Repo | |
Framework | |
Convolutional neural networks for low-resource morpheme segmentation: baseline or state-of-the-art?
Title | Convolutional neural networks for low-resource morpheme segmentation: baseline or state-of-the-art? |
Authors | Alexey Sorokin |
Abstract | We apply convolutional neural networks to the task of shallow morpheme segmentation using low-resource datasets for 5 different languages. We show that both in fully supervised and semi-supervised settings our model beats previous state-of-the-art approaches. We argue that convolutional neural networks reflect local nature of morpheme segmentation better than other semi-supervised approaches. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4218/ |
https://www.aclweb.org/anthology/W19-4218 | |
PWC | https://paperswithcode.com/paper/convolutional-neural-networks-for-low |
Repo | |
Framework | |
Incivility Detection in Online Comments
Title | Incivility Detection in Online Comments |
Authors | Farig Sadeque, Stephen Rains, Yotam Shmargad, Kate Kenski, Kevin Coe, Steven Bethard |
Abstract | Incivility in public discourse has been a major concern in recent times as it can affect the quality and tenacity of the discourse negatively. In this paper, we present neural models that can learn to detect name-calling and vulgarity from a newspaper comment section. We show that in contrast to prior work on detecting toxic language, fine-grained incivilities like namecalling cannot be accurately detected by simple models like logistic regression. We apply the models trained on the newspaper comments data to detect uncivil comments in a Russian troll dataset, and find that despite the change of domain, the model makes accurate predictions. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-1031/ |
https://www.aclweb.org/anthology/S19-1031 | |
PWC | https://paperswithcode.com/paper/incivility-detection-in-online-comments |
Repo | |
Framework | |
Evaluating BERT for natural language inference: A case study on the CommitmentBank
Title | Evaluating BERT for natural language inference: A case study on the CommitmentBank |
Authors | Nanjiang Jiang, Marie-Catherine de Marneffe |
Abstract | Natural language inference (NLI) datasets (e.g., MultiNLI) were collected by soliciting hypotheses for a given premise from annotators. Such data collection led to annotation artifacts: systems can identify the premise-hypothesis relationship without observing the premise (e.g., negation in hypothesis being indicative of contradiction). We address this problem by recasting the CommitmentBank for NLI, which contains items involving reasoning over the extent to which a speaker is committed to complements of clause-embedding verbs under entailment-canceling environments (conditional, negation, modal and question). Instead of being constructed to stand in certain relationships with the premise, hypotheses in the recast CommitmentBank are the complements of the clause-embedding verb in each premise, leading to no annotation artifacts in the hypothesis. A state-of-the-art BERT-based model performs well on the CommitmentBank with 85{%} F1. However analysis of model behavior shows that the BERT models still do not capture the full complexity of pragmatic reasoning, nor encode some of the linguistic generalizations, highlighting room for improvement. |
Tasks | Natural Language Inference |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1630/ |
https://www.aclweb.org/anthology/D19-1630 | |
PWC | https://paperswithcode.com/paper/evaluating-bert-for-natural-language |
Repo | |
Framework | |
Classifying Author Intention for Writer Feedback in Related Work
Title | Classifying Author Intention for Writer Feedback in Related Work |
Authors | Arlene Casey, Bonnie Webber, Dorota Glowacka |
Abstract | The ability to produce high-quality publishable material is critical to academic success but many Post-Graduate students struggle to learn to do so. While recent years have seen an increase in tools designed to provide feedback on aspects of writing, one aspect that has so far been neglected is the Related Work section of academic research papers. To address this, we have trained a supervised classifier on a corpus of 94 Related Work sections and evaluated it against a manually annotated gold standard. The classifier uses novel features pertaining to citation types and co-reference, along with patterns found from studying Related Works. We show that these novel features contribute to classifier performance with performance being favourable compared to other similar works that classify author intentions and consider feedback for academic writing. |
Tasks | |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1021/ |
https://www.aclweb.org/anthology/R19-1021 | |
PWC | https://paperswithcode.com/paper/classifying-author-intention-for-writer |
Repo | |
Framework | |
The Risk of Racial Bias in Hate Speech Detection
Title | The Risk of Racial Bias in Hate Speech Detection |
Authors | Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, Noah A. Smith |
Abstract | We investigate how annotators{'} insensitivity to differences in dialect can lead to racial bias in automatic hate speech detection models, potentially amplifying harm against minority populations. We first uncover unexpected correlations between surface markers of African American English (AAE) and ratings of toxicity in several widely-used hate speech datasets. Then, we show that models trained on these corpora acquire and propagate these biases, such that AAE tweets and tweets by self-identified African Americans are up to two times more likely to be labelled as offensive compared to others. Finally, we propose dialect and race priming as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet{'}s dialect they are significantly less likely to label the tweet as offensive. |
Tasks | Hate Speech Detection |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1163/ |
https://www.aclweb.org/anthology/P19-1163 | |
PWC | https://paperswithcode.com/paper/the-risk-of-racial-bias-in-hate-speech |
Repo | |
Framework | |
Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling
Title | Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling |
Authors | Jinfeng Rao, Linqing Liu, Yi Tay, Wei Yang, Peng Shi, Jimmy Lin |
Abstract | A core problem of information retrieval (IR) is relevance matching, which is to rank documents by relevance to a user{'}s query. On the other hand, many NLP problems, such as question answering and paraphrase identification, can be considered variants of semantic matching, which is to measure the semantic distance between two pieces of short texts. While at a high level both relevance and semantic matching require modeling textual similarity, many existing techniques for one cannot be easily adapted to the other. To bridge this gap, we propose a novel model, HCAN (Hybrid Co-Attention Network), that comprises (1) a hybrid encoder module that includes ConvNet-based and LSTM-based encoders, (2) a relevance matching module that measures soft term matches with importance weighting at multiple granularities, and (3) a semantic matching module with co-attention mechanisms that capture context-aware semantic relatedness. Evaluations on multiple IR and NLP benchmarks demonstrate state-of-the-art effectiveness compared to approaches that do not exploit pretraining on external data. Extensive ablation studies suggest that relevance and semantic matching signals are complementary across many problem settings, regardless of the choice of underlying encoders. |
Tasks | Information Retrieval, Paraphrase Identification, Question Answering |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1540/ |
https://www.aclweb.org/anthology/D19-1540 | |
PWC | https://paperswithcode.com/paper/bridging-the-gap-between-relevance-matching |
Repo | |
Framework | |