Paper Group AWR 36
Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media. Kernel Exponential Family Estimation via Doubly Dual Embedding. Compact Generalized Non-local Network. Multi-Task Graph Autoencoders. HyperAdam: A Learnable Task-Adaptive Adam for Network Training. Towards Exploiting Background Knowledge for Building Conversation Systems …
Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media
Title | Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media |
Authors | Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, Elizabeth Belding |
Abstract | While social media empowers freedom of expression and individual voices, it also enables anti-social behavior, online harassment, cyberbullying, and hate speech. In this paper, we deepen our understanding of online hate speech by focusing on a largely neglected but crucial aspect of hate speech – its target: either “directed” towards a specific person or entity, or “generalized” towards a group of people sharing a common protected characteristic. We perform the first linguistic and psycholinguistic analysis of these two forms of hate speech and reveal the presence of interesting markers that distinguish these types of hate speech. Our analysis reveals that Directed hate speech, in addition to being more personal and directed, is more informal, angrier, and often explicitly attacks the target (via name calling) with fewer analytic words and more words suggesting authority and influence. Generalized hate speech, on the other hand, is dominated by religious hate, is characterized by the use of lethal words such as murder, exterminate, and kill; and quantity words such as million and many. Altogether, our work provides a data-driven analysis of the nuances of online-hate speech that enables not only a deepened understanding of hate speech and its social implications but also its detection. |
Tasks | |
Published | 2018-04-11 |
URL | http://arxiv.org/abs/1804.04257v1 |
http://arxiv.org/pdf/1804.04257v1.pdf | |
PWC | https://paperswithcode.com/paper/hate-lingo-a-target-based-linguistic-analysis |
Repo | https://github.com/ben-aaron188/ucl_aca_20182019 |
Framework | none |
Kernel Exponential Family Estimation via Doubly Dual Embedding
Title | Kernel Exponential Family Estimation via Doubly Dual Embedding |
Authors | Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He |
Abstract | We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space. Key to our approach is a novel technique, doubly dual embedding, that avoids computation of the partition function. This technique also allows the development of a flexible sampling strategy that amortizes the cost of Monte-Carlo sampling in the inference stage. The resulting estimator can be easily generalized to kernel conditional exponential families. We establish a connection between kernel exponential family estimation and MMD-GANs, revealing a new perspective for understanding GANs. Compared to the score matching based estimators, the proposed method improves both memory and time efficiency while enjoying stronger statistical properties, such as fully capturing smoothness in its statistical convergence rate while the score matching estimator appears to saturate. Finally, we show that the proposed estimator empirically outperforms state-of-the-art |
Tasks | |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02228v3 |
http://arxiv.org/pdf/1811.02228v3.pdf | |
PWC | https://paperswithcode.com/paper/kernel-exponential-family-estimation-via |
Repo | https://github.com/Hanjun-Dai/dde |
Framework | pytorch |
Compact Generalized Non-local Network
Title | Compact Generalized Non-local Network |
Authors | Kaiyu Yue, Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding, Fuxin Xu |
Abstract | The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos. Although having shown excellent performance, it lacks the mechanism to model the interactions between positions across channels, which are of vital importance in recognizing fine-grained objects and actions. To address this limitation, we generalize the non-local module and take the correlations between the positions of any two channels into account. This extension utilizes the compact representation for multiple kernel functions with Taylor expansion that makes the generalized non-local module in a fast and low-complexity computation flow. Moreover, we implement our generalized non-local method within channel groups to ease the optimization. Experimental results illustrate the clear-cut improvements and practical applicability of the generalized non-local module on both fine-grained object recognition and video classification. Code is available at: https://github.com/KaiyuYue/cgnl-network.pytorch. |
Tasks | Object Detection, Object Recognition, Video Classification |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13125v2 |
http://arxiv.org/pdf/1810.13125v2.pdf | |
PWC | https://paperswithcode.com/paper/compact-generalized-non-local-network |
Repo | https://github.com/KaiyuYue/cgnl-network.pytorch |
Framework | pytorch |
Multi-Task Graph Autoencoders
Title | Multi-Task Graph Autoencoders |
Authors | Phi Vu Tran |
Abstract | We examine two fundamental tasks associated with graph representation learning: link prediction and node classification. We present a new autoencoder architecture capable of learning a joint representation of local graph structure and available node features for the simultaneous multi-task learning of unsupervised link prediction and semi-supervised node classification. Our simple, yet effective and versatile model is efficiently trained end-to-end in a single stage, whereas previous related deep graph embedding methods require multiple training steps that are difficult to optimize. We provide an empirical evaluation of our model on five benchmark relational, graph-structured datasets and demonstrate significant improvement over three strong baselines for graph representation learning. Reference code and data are available at https://github.com/vuptran/graph-representation-learning |
Tasks | Graph Embedding, Graph Representation Learning, Link Prediction, Multi-Task Learning, Node Classification, Representation Learning |
Published | 2018-11-07 |
URL | http://arxiv.org/abs/1811.02798v1 |
http://arxiv.org/pdf/1811.02798v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-graph-autoencoders |
Repo | https://github.com/vuptran/graph-representation-learning |
Framework | tf |
HyperAdam: A Learnable Task-Adaptive Adam for Network Training
Title | HyperAdam: A Learnable Task-Adaptive Adam for Network Training |
Authors | Shipeng Wang, Jian Sun, Zongben Xu |
Abstract | Deep neural networks are traditionally trained using human-designed stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as \textit{HyperAdam}, is proposed that combines the idea of “learning to optimize” and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates. The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM. |
Tasks | Stochastic Optimization |
Published | 2018-11-22 |
URL | http://arxiv.org/abs/1811.08996v1 |
http://arxiv.org/pdf/1811.08996v1.pdf | |
PWC | https://paperswithcode.com/paper/hyperadam-a-learnable-task-adaptive-adam-for |
Repo | https://github.com/ShipengWang/HyperAdam-Tensorflow |
Framework | tf |
Towards Exploiting Background Knowledge for Building Conversation Systems
Title | Towards Exploiting Background Knowledge for Building Conversation Systems |
Authors | Nikita Moghe, Siddhartha Arora, Suman Banerjee, Mitesh M. Khapra |
Abstract | Existing dialog datasets contain a sequence of utterances and responses without any explicit background knowledge associated with them. This has resulted in the development of models which treat conversation as a sequence-to-sequence generation task i.e, given a sequence of utterances generate the response sequence). This is not only an overly simplistic view of conversation but it is also emphatically different from the way humans converse by heavily relying on their background knowledge about the topic (as opposed to simply relying on the previous sequence of utterances). For example, it is common for humans to (involuntarily) produce utterances which are copied or suitably modified from background articles they have read about the topic. To facilitate the development of such natural conversation models which mimic the human process of conversing, we create a new dataset containing movie chats wherein each response is explicitly generated by copying and/or modifying sentences from unstructured background knowledge such as plots, comments and reviews about the movie. We establish baseline results on this dataset (90K utterances from 9K conversations) using three different models: (i) pure generation based models which ignore the background knowledge (ii) generation based models which learn to copy information from the background knowledge when required and (iii) span prediction based models which predict the appropriate response span in the background knowledge. |
Tasks | |
Published | 2018-09-21 |
URL | http://arxiv.org/abs/1809.08205v1 |
http://arxiv.org/pdf/1809.08205v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-exploiting-background-knowledge-for |
Repo | https://github.com/nikitacs16/Holl-E |
Framework | none |
Learning Global Additive Explanations for Neural Nets Using Model Distillation
Title | Learning Global Additive Explanations for Neural Nets Using Model Distillation |
Authors | Sarah Tan, Rich Caruana, Giles Hooker, Paul Koch, Albert Gordo |
Abstract | Interpretability has largely focused on local explanations, i.e. explaining why a model made a particular prediction for a sample. These explanations are appealing due to their simplicity and local fidelity. However, they do not provide information about the general behavior of the model. We propose to leverage model distillation to learn global additive explanations that describe the relationship between input features and model predictions. These global explanations take the form of feature shapes, which are more expressive than feature attributions. Through careful experimentation, we show qualitatively and quantitatively that global additive explanations are able to describe model behavior and yield insights about models such as neural nets. A visualization of our approach applied to a neural net as it is trained is available at https://youtu.be/ErQYwNqzEdc. |
Tasks | |
Published | 2018-01-26 |
URL | http://arxiv.org/abs/1801.08640v2 |
http://arxiv.org/pdf/1801.08640v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-global-additive-explanations-for |
Repo | https://github.com/aclarkData/MachineLearningInterpretability |
Framework | none |
Can recurrent neural networks warp time?
Title | Can recurrent neural networks warp time? |
Authors | Corentin Tallec, Yann Ollivier |
Abstract | Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi- invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Ex- perimentally, this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort. |
Tasks | |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1804.11188v1 |
http://arxiv.org/pdf/1804.11188v1.pdf | |
PWC | https://paperswithcode.com/paper/can-recurrent-neural-networks-warp-time |
Repo | https://github.com/AravindGanesh/ChronoLSTM |
Framework | none |
Deep Uncertainty Quantification: A Machine Learning Approach for Weather Forecasting
Title | Deep Uncertainty Quantification: A Machine Learning Approach for Weather Forecasting |
Authors | Bin Wang, Jie Lu, Zheng Yan, Huaishao Luo, Tianrui Li, Yu Zheng, Guangquan Zhang |
Abstract | Weather forecasting is usually solved through numerical weather prediction (NWP), which can sometimes lead to unsatisfactory performance due to inappropriate setting of the initial states. In this paper, we design a data-driven method augmented by an effective information fusion mechanism to learn from historical data that incorporates prior knowledge from NWP. We cast the weather forecasting problem as an end-to-end deep learning problem and solve it by proposing a novel negative log-likelihood error (NLE) loss function. A notable advantage of our proposed method is that it simultaneously implements single-value forecasting and uncertainty quantification, which we refer to as deep uncertainty quantification (DUQ). Efficient deep ensemble strategies are also explored to further improve performance. This new approach was evaluated on a public dataset collected from weather stations in Beijing, China. Experimental results demonstrate that the proposed NLE loss significantly improves generalization compared to mean squared error (MSE) loss and mean absolute error (MAE) loss. Compared with NWP, this approach significantly improves accuracy by 47.76%, which is a state-of-the-art result on this benchmark dataset. The preliminary version of the proposed method won 2nd place in an online competition for daily weather forecasting. |
Tasks | Weather Forecasting |
Published | 2018-12-22 |
URL | http://arxiv.org/abs/1812.09467v3 |
http://arxiv.org/pdf/1812.09467v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-uncertainty-quantification-a-machine |
Repo | https://github.com/BruceBinBoxing/WF |
Framework | tf |
Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT
Title | Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT |
Authors | Jing Yang, Biao Zhang, Yue Qin, Xiangwen Zhang, Qian Lin, Jinsong Su |
Abstract | Although neural machine translation(NMT) yields promising translation performance, it unfortunately suffers from over- and under-translation is- sues [Tu et al., 2016], of which studies have become research hotspots in NMT. At present, these studies mainly apply the dominant automatic evaluation metrics, such as BLEU, to evaluate the overall translation quality with respect to both adequacy and uency. However, they are unable to accurately measure the ability of NMT systems in dealing with the above-mentioned issues. In this paper, we propose two quantitative metrics, the Otem and Utem, to automatically evaluate the system perfor- mance in terms of over- and under-translation respectively. Both metrics are based on the proportion of mismatched n-grams between gold ref- erence and system translation. We evaluate both metrics by comparing their scores with human evaluations, where the values of Pearson Cor- relation Coefficient reveal their strong correlation. Moreover, in-depth analyses on various translation systems indicate some inconsistency be- tween BLEU and our proposed metrics, highlighting the necessity and significance of our metrics. |
Tasks | Machine Translation |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.08945v1 |
http://arxiv.org/pdf/1807.08945v1.pdf | |
PWC | https://paperswithcode.com/paper/otemutem-over-and-under-translation |
Repo | https://github.com/DeepLearnXMU/Otem-Utem |
Framework | none |
U-Net: Machine Reading Comprehension with Unanswerable Questions
Title | U-Net: Machine Reading Comprehension with Unanswerable Questions |
Authors | Fu Sun, Linyang Li, Xipeng Qiu, Yang Liu |
Abstract | Machine reading comprehension with unanswerable questions is a new challenging task for natural language processing. A key subtask is to reliably predict whether the question is unanswerable. In this paper, we propose a unified model, called U-Net, with three important components: answer pointer, no-answer pointer, and answer verifier. We introduce a universal node and thus process the question and its context passage as a single contiguous sequence of tokens. The universal node encodes the fused information from both the question and passage, and plays an important role to predict whether the question is answerable and also greatly improves the conciseness of the U-Net. Different from the state-of-art pipeline models, U-Net can be learned in an end-to-end fashion. The experimental results on the SQuAD 2.0 dataset show that U-Net can effectively predict the unanswerability of questions and achieves an F1 score of 71.7 on SQuAD 2.0. |
Tasks | Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2018-10-12 |
URL | http://arxiv.org/abs/1810.06638v1 |
http://arxiv.org/pdf/1810.06638v1.pdf | |
PWC | https://paperswithcode.com/paper/u-net-machine-reading-comprehension-with |
Repo | https://github.com/FudanNLP/UNet |
Framework | pytorch |
INFODENS: An Open-source Framework for Learning Text Representations
Title | INFODENS: An Open-source Framework for Learning Text Representations |
Authors | Ahmad Taie, Raphael Rubino, Josef van Genabith |
Abstract | The advent of representation learning methods enabled large performance gains on various language tasks, alleviating the need for manual feature engineering. While engineered representations are usually based on some linguistic understanding and are therefore more interpretable, learned representations are harder to interpret. Empirically studying the complementarity of both approaches can provide more linguistic insights that would help reach a better compromise between interpretability and performance. We present INFODENS, a framework for studying learned and engineered representations of text in the context of text classification tasks. It is designed to simplify the tasks of feature engineering as well as provide the groundwork for extracting learned features and combining both approaches. INFODENS is flexible, extensible, with a short learning curve, and is easy to integrate with many of the available and widely used natural language processing tools. |
Tasks | Feature Engineering, Representation Learning, Text Classification |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.07091v1 |
http://arxiv.org/pdf/1810.07091v1.pdf | |
PWC | https://paperswithcode.com/paper/infodens-an-open-source-framework-for |
Repo | https://github.com/ahmad-taie/infodens |
Framework | tf |
Spectral Inference Networks: Unifying Deep and Spectral Learning
Title | Spectral Inference Networks: Unifying Deep and Spectral Learning |
Authors | David Pfau, Stig Petersen, Ashish Agarwal, David G. T. Barrett, Kimberly L. Stachenfeld |
Abstract | We present Spectral Inference Networks, a framework for learning eigenfunctions of linear operators by stochastic optimization. Spectral Inference Networks generalize Slow Feature Analysis to generic symmetric operators, and are closely related to Variational Monte Carlo methods from computational physics. As such, they can be a powerful tool for unsupervised representation learning from video or graph-structured data. We cast training Spectral Inference Networks as a bilevel optimization problem, which allows for online learning of multiple eigenfunctions. We show results of training Spectral Inference Networks on problems in quantum mechanics and feature learning for videos on synthetic datasets. Our results demonstrate that Spectral Inference Networks accurately recover eigenfunctions of linear operators and can discover interpretable representations from video in a fully unsupervised manner. |
Tasks | Atari Games, bilevel optimization, Representation Learning, Stochastic Optimization, Unsupervised Representation Learning |
Published | 2018-06-06 |
URL | https://arxiv.org/abs/1806.02215v3 |
https://arxiv.org/pdf/1806.02215v3.pdf | |
PWC | https://paperswithcode.com/paper/spectral-inference-networks-unifying-spectral |
Repo | https://github.com/deepmind/spectral_inference_networks |
Framework | tf |
Bio-YODIE: A Named Entity Linking System for Biomedical Text
Title | Bio-YODIE: A Named Entity Linking System for Biomedical Text |
Authors | Genevieve Gorrell, Xingyi Song, Angus Roberts |
Abstract | Ever-expanding volumes of biomedical text require automated semantic annotation techniques to curate and put to best use. An established field of research seeks to link mentions in text to knowledge bases such as those included in the UMLS (Unified Medical Language System), in order to enable a more sophisticated understanding. This work has yielded good results for tasks such as curating literature, but increasingly, annotation systems are more broadly applied. Medical vocabularies are expanding in size, and with them the extent of term ambiguity. Document collections are increasing in size and complexity, creating a greater need for speed and robustness. Furthermore, as the technologies are turned to new tasks, requirements change; for example greater coverage of expressions may be required in order to annotate patient records, and greater accuracy may be needed for applications that affect patients. This places new demands on the approaches currently in use. In this work, we present a new system, Bio-YODIE, and compare it to two other popular systems in order to give guidance about suitable approaches in different scenarios and how systems might be designed to accommodate future needs. |
Tasks | Entity Linking |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.04860v1 |
http://arxiv.org/pdf/1811.04860v1.pdf | |
PWC | https://paperswithcode.com/paper/bio-yodie-a-named-entity-linking-system-for |
Repo | https://github.com/GateNLP/Bio-YODIE |
Framework | none |
Dynamic Graph CNN for Learning on Point Clouds
Title | Dynamic Graph CNN for Learning on Point Clouds |
Authors | Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, Justin M. Solomon |
Abstract | Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks including ModelNet40, ShapeNetPart, and S3DIS. |
Tasks | |
Published | 2018-01-24 |
URL | https://arxiv.org/abs/1801.07829v2 |
https://arxiv.org/pdf/1801.07829v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-graph-cnn-for-learning-on-point |
Repo | https://github.com/AnTao97/UnsupervisedPointCloudReconstruction |
Framework | pytorch |