October 21, 2019

3038 words 15 mins read

Paper Group AWR 36

Paper Group AWR 36

Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media. Kernel Exponential Family Estimation via Doubly Dual Embedding. Compact Generalized Non-local Network. Multi-Task Graph Autoencoders. HyperAdam: A Learnable Task-Adaptive Adam for Network Training. Towards Exploiting Background Knowledge for Building Conversation Systems …

Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media

Title Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media
Authors Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, Elizabeth Belding
Abstract While social media empowers freedom of expression and individual voices, it also enables anti-social behavior, online harassment, cyberbullying, and hate speech. In this paper, we deepen our understanding of online hate speech by focusing on a largely neglected but crucial aspect of hate speech – its target: either “directed” towards a specific person or entity, or “generalized” towards a group of people sharing a common protected characteristic. We perform the first linguistic and psycholinguistic analysis of these two forms of hate speech and reveal the presence of interesting markers that distinguish these types of hate speech. Our analysis reveals that Directed hate speech, in addition to being more personal and directed, is more informal, angrier, and often explicitly attacks the target (via name calling) with fewer analytic words and more words suggesting authority and influence. Generalized hate speech, on the other hand, is dominated by religious hate, is characterized by the use of lethal words such as murder, exterminate, and kill; and quantity words such as million and many. Altogether, our work provides a data-driven analysis of the nuances of online-hate speech that enables not only a deepened understanding of hate speech and its social implications but also its detection.
Tasks
Published 2018-04-11
URL http://arxiv.org/abs/1804.04257v1
PDF http://arxiv.org/pdf/1804.04257v1.pdf
PWC https://paperswithcode.com/paper/hate-lingo-a-target-based-linguistic-analysis
Repo https://github.com/ben-aaron188/ucl_aca_20182019
Framework none

Kernel Exponential Family Estimation via Doubly Dual Embedding

Title Kernel Exponential Family Estimation via Doubly Dual Embedding
Authors Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He
Abstract We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space. Key to our approach is a novel technique, doubly dual embedding, that avoids computation of the partition function. This technique also allows the development of a flexible sampling strategy that amortizes the cost of Monte-Carlo sampling in the inference stage. The resulting estimator can be easily generalized to kernel conditional exponential families. We establish a connection between kernel exponential family estimation and MMD-GANs, revealing a new perspective for understanding GANs. Compared to the score matching based estimators, the proposed method improves both memory and time efficiency while enjoying stronger statistical properties, such as fully capturing smoothness in its statistical convergence rate while the score matching estimator appears to saturate. Finally, we show that the proposed estimator empirically outperforms state-of-the-art
Tasks
Published 2018-11-06
URL http://arxiv.org/abs/1811.02228v3
PDF http://arxiv.org/pdf/1811.02228v3.pdf
PWC https://paperswithcode.com/paper/kernel-exponential-family-estimation-via
Repo https://github.com/Hanjun-Dai/dde
Framework pytorch

Compact Generalized Non-local Network

Title Compact Generalized Non-local Network
Authors Kaiyu Yue, Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding, Fuxin Xu
Abstract The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos. Although having shown excellent performance, it lacks the mechanism to model the interactions between positions across channels, which are of vital importance in recognizing fine-grained objects and actions. To address this limitation, we generalize the non-local module and take the correlations between the positions of any two channels into account. This extension utilizes the compact representation for multiple kernel functions with Taylor expansion that makes the generalized non-local module in a fast and low-complexity computation flow. Moreover, we implement our generalized non-local method within channel groups to ease the optimization. Experimental results illustrate the clear-cut improvements and practical applicability of the generalized non-local module on both fine-grained object recognition and video classification. Code is available at: https://github.com/KaiyuYue/cgnl-network.pytorch.
Tasks Object Detection, Object Recognition, Video Classification
Published 2018-10-31
URL http://arxiv.org/abs/1810.13125v2
PDF http://arxiv.org/pdf/1810.13125v2.pdf
PWC https://paperswithcode.com/paper/compact-generalized-non-local-network
Repo https://github.com/KaiyuYue/cgnl-network.pytorch
Framework pytorch

Multi-Task Graph Autoencoders

Title Multi-Task Graph Autoencoders
Authors Phi Vu Tran
Abstract We examine two fundamental tasks associated with graph representation learning: link prediction and node classification. We present a new autoencoder architecture capable of learning a joint representation of local graph structure and available node features for the simultaneous multi-task learning of unsupervised link prediction and semi-supervised node classification. Our simple, yet effective and versatile model is efficiently trained end-to-end in a single stage, whereas previous related deep graph embedding methods require multiple training steps that are difficult to optimize. We provide an empirical evaluation of our model on five benchmark relational, graph-structured datasets and demonstrate significant improvement over three strong baselines for graph representation learning. Reference code and data are available at https://github.com/vuptran/graph-representation-learning
Tasks Graph Embedding, Graph Representation Learning, Link Prediction, Multi-Task Learning, Node Classification, Representation Learning
Published 2018-11-07
URL http://arxiv.org/abs/1811.02798v1
PDF http://arxiv.org/pdf/1811.02798v1.pdf
PWC https://paperswithcode.com/paper/multi-task-graph-autoencoders
Repo https://github.com/vuptran/graph-representation-learning
Framework tf

HyperAdam: A Learnable Task-Adaptive Adam for Network Training

Title HyperAdam: A Learnable Task-Adaptive Adam for Network Training
Authors Shipeng Wang, Jian Sun, Zongben Xu
Abstract Deep neural networks are traditionally trained using human-designed stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as \textit{HyperAdam}, is proposed that combines the idea of “learning to optimize” and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates. The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.
Tasks Stochastic Optimization
Published 2018-11-22
URL http://arxiv.org/abs/1811.08996v1
PDF http://arxiv.org/pdf/1811.08996v1.pdf
PWC https://paperswithcode.com/paper/hyperadam-a-learnable-task-adaptive-adam-for
Repo https://github.com/ShipengWang/HyperAdam-Tensorflow
Framework tf

Towards Exploiting Background Knowledge for Building Conversation Systems

Title Towards Exploiting Background Knowledge for Building Conversation Systems
Authors Nikita Moghe, Siddhartha Arora, Suman Banerjee, Mitesh M. Khapra
Abstract Existing dialog datasets contain a sequence of utterances and responses without any explicit background knowledge associated with them. This has resulted in the development of models which treat conversation as a sequence-to-sequence generation task i.e, given a sequence of utterances generate the response sequence). This is not only an overly simplistic view of conversation but it is also emphatically different from the way humans converse by heavily relying on their background knowledge about the topic (as opposed to simply relying on the previous sequence of utterances). For example, it is common for humans to (involuntarily) produce utterances which are copied or suitably modified from background articles they have read about the topic. To facilitate the development of such natural conversation models which mimic the human process of conversing, we create a new dataset containing movie chats wherein each response is explicitly generated by copying and/or modifying sentences from unstructured background knowledge such as plots, comments and reviews about the movie. We establish baseline results on this dataset (90K utterances from 9K conversations) using three different models: (i) pure generation based models which ignore the background knowledge (ii) generation based models which learn to copy information from the background knowledge when required and (iii) span prediction based models which predict the appropriate response span in the background knowledge.
Tasks
Published 2018-09-21
URL http://arxiv.org/abs/1809.08205v1
PDF http://arxiv.org/pdf/1809.08205v1.pdf
PWC https://paperswithcode.com/paper/towards-exploiting-background-knowledge-for
Repo https://github.com/nikitacs16/Holl-E
Framework none

Learning Global Additive Explanations for Neural Nets Using Model Distillation

Title Learning Global Additive Explanations for Neural Nets Using Model Distillation
Authors Sarah Tan, Rich Caruana, Giles Hooker, Paul Koch, Albert Gordo
Abstract Interpretability has largely focused on local explanations, i.e. explaining why a model made a particular prediction for a sample. These explanations are appealing due to their simplicity and local fidelity. However, they do not provide information about the general behavior of the model. We propose to leverage model distillation to learn global additive explanations that describe the relationship between input features and model predictions. These global explanations take the form of feature shapes, which are more expressive than feature attributions. Through careful experimentation, we show qualitatively and quantitatively that global additive explanations are able to describe model behavior and yield insights about models such as neural nets. A visualization of our approach applied to a neural net as it is trained is available at https://youtu.be/ErQYwNqzEdc.
Tasks
Published 2018-01-26
URL http://arxiv.org/abs/1801.08640v2
PDF http://arxiv.org/pdf/1801.08640v2.pdf
PWC https://paperswithcode.com/paper/learning-global-additive-explanations-for
Repo https://github.com/aclarkData/MachineLearningInterpretability
Framework none

Can recurrent neural networks warp time?

Title Can recurrent neural networks warp time?
Authors Corentin Tallec, Yann Ollivier
Abstract Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi- invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Ex- perimentally, this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort.
Tasks
Published 2018-03-23
URL http://arxiv.org/abs/1804.11188v1
PDF http://arxiv.org/pdf/1804.11188v1.pdf
PWC https://paperswithcode.com/paper/can-recurrent-neural-networks-warp-time
Repo https://github.com/AravindGanesh/ChronoLSTM
Framework none

Deep Uncertainty Quantification: A Machine Learning Approach for Weather Forecasting

Title Deep Uncertainty Quantification: A Machine Learning Approach for Weather Forecasting
Authors Bin Wang, Jie Lu, Zheng Yan, Huaishao Luo, Tianrui Li, Yu Zheng, Guangquan Zhang
Abstract Weather forecasting is usually solved through numerical weather prediction (NWP), which can sometimes lead to unsatisfactory performance due to inappropriate setting of the initial states. In this paper, we design a data-driven method augmented by an effective information fusion mechanism to learn from historical data that incorporates prior knowledge from NWP. We cast the weather forecasting problem as an end-to-end deep learning problem and solve it by proposing a novel negative log-likelihood error (NLE) loss function. A notable advantage of our proposed method is that it simultaneously implements single-value forecasting and uncertainty quantification, which we refer to as deep uncertainty quantification (DUQ). Efficient deep ensemble strategies are also explored to further improve performance. This new approach was evaluated on a public dataset collected from weather stations in Beijing, China. Experimental results demonstrate that the proposed NLE loss significantly improves generalization compared to mean squared error (MSE) loss and mean absolute error (MAE) loss. Compared with NWP, this approach significantly improves accuracy by 47.76%, which is a state-of-the-art result on this benchmark dataset. The preliminary version of the proposed method won 2nd place in an online competition for daily weather forecasting.
Tasks Weather Forecasting
Published 2018-12-22
URL http://arxiv.org/abs/1812.09467v3
PDF http://arxiv.org/pdf/1812.09467v3.pdf
PWC https://paperswithcode.com/paper/deep-uncertainty-quantification-a-machine
Repo https://github.com/BruceBinBoxing/WF
Framework tf

Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT

Title Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT
Authors Jing Yang, Biao Zhang, Yue Qin, Xiangwen Zhang, Qian Lin, Jinsong Su
Abstract Although neural machine translation(NMT) yields promising translation performance, it unfortunately suffers from over- and under-translation is- sues [Tu et al., 2016], of which studies have become research hotspots in NMT. At present, these studies mainly apply the dominant automatic evaluation metrics, such as BLEU, to evaluate the overall translation quality with respect to both adequacy and uency. However, they are unable to accurately measure the ability of NMT systems in dealing with the above-mentioned issues. In this paper, we propose two quantitative metrics, the Otem and Utem, to automatically evaluate the system perfor- mance in terms of over- and under-translation respectively. Both metrics are based on the proportion of mismatched n-grams between gold ref- erence and system translation. We evaluate both metrics by comparing their scores with human evaluations, where the values of Pearson Cor- relation Coefficient reveal their strong correlation. Moreover, in-depth analyses on various translation systems indicate some inconsistency be- tween BLEU and our proposed metrics, highlighting the necessity and significance of our metrics.
Tasks Machine Translation
Published 2018-07-24
URL http://arxiv.org/abs/1807.08945v1
PDF http://arxiv.org/pdf/1807.08945v1.pdf
PWC https://paperswithcode.com/paper/otemutem-over-and-under-translation
Repo https://github.com/DeepLearnXMU/Otem-Utem
Framework none

U-Net: Machine Reading Comprehension with Unanswerable Questions

Title U-Net: Machine Reading Comprehension with Unanswerable Questions
Authors Fu Sun, Linyang Li, Xipeng Qiu, Yang Liu
Abstract Machine reading comprehension with unanswerable questions is a new challenging task for natural language processing. A key subtask is to reliably predict whether the question is unanswerable. In this paper, we propose a unified model, called U-Net, with three important components: answer pointer, no-answer pointer, and answer verifier. We introduce a universal node and thus process the question and its context passage as a single contiguous sequence of tokens. The universal node encodes the fused information from both the question and passage, and plays an important role to predict whether the question is answerable and also greatly improves the conciseness of the U-Net. Different from the state-of-art pipeline models, U-Net can be learned in an end-to-end fashion. The experimental results on the SQuAD 2.0 dataset show that U-Net can effectively predict the unanswerability of questions and achieves an F1 score of 71.7 on SQuAD 2.0.
Tasks Machine Reading Comprehension, Question Answering, Reading Comprehension
Published 2018-10-12
URL http://arxiv.org/abs/1810.06638v1
PDF http://arxiv.org/pdf/1810.06638v1.pdf
PWC https://paperswithcode.com/paper/u-net-machine-reading-comprehension-with
Repo https://github.com/FudanNLP/UNet
Framework pytorch

INFODENS: An Open-source Framework for Learning Text Representations

Title INFODENS: An Open-source Framework for Learning Text Representations
Authors Ahmad Taie, Raphael Rubino, Josef van Genabith
Abstract The advent of representation learning methods enabled large performance gains on various language tasks, alleviating the need for manual feature engineering. While engineered representations are usually based on some linguistic understanding and are therefore more interpretable, learned representations are harder to interpret. Empirically studying the complementarity of both approaches can provide more linguistic insights that would help reach a better compromise between interpretability and performance. We present INFODENS, a framework for studying learned and engineered representations of text in the context of text classification tasks. It is designed to simplify the tasks of feature engineering as well as provide the groundwork for extracting learned features and combining both approaches. INFODENS is flexible, extensible, with a short learning curve, and is easy to integrate with many of the available and widely used natural language processing tools.
Tasks Feature Engineering, Representation Learning, Text Classification
Published 2018-10-16
URL http://arxiv.org/abs/1810.07091v1
PDF http://arxiv.org/pdf/1810.07091v1.pdf
PWC https://paperswithcode.com/paper/infodens-an-open-source-framework-for
Repo https://github.com/ahmad-taie/infodens
Framework tf

Spectral Inference Networks: Unifying Deep and Spectral Learning

Title Spectral Inference Networks: Unifying Deep and Spectral Learning
Authors David Pfau, Stig Petersen, Ashish Agarwal, David G. T. Barrett, Kimberly L. Stachenfeld
Abstract We present Spectral Inference Networks, a framework for learning eigenfunctions of linear operators by stochastic optimization. Spectral Inference Networks generalize Slow Feature Analysis to generic symmetric operators, and are closely related to Variational Monte Carlo methods from computational physics. As such, they can be a powerful tool for unsupervised representation learning from video or graph-structured data. We cast training Spectral Inference Networks as a bilevel optimization problem, which allows for online learning of multiple eigenfunctions. We show results of training Spectral Inference Networks on problems in quantum mechanics and feature learning for videos on synthetic datasets. Our results demonstrate that Spectral Inference Networks accurately recover eigenfunctions of linear operators and can discover interpretable representations from video in a fully unsupervised manner.
Tasks Atari Games, bilevel optimization, Representation Learning, Stochastic Optimization, Unsupervised Representation Learning
Published 2018-06-06
URL https://arxiv.org/abs/1806.02215v3
PDF https://arxiv.org/pdf/1806.02215v3.pdf
PWC https://paperswithcode.com/paper/spectral-inference-networks-unifying-spectral
Repo https://github.com/deepmind/spectral_inference_networks
Framework tf

Bio-YODIE: A Named Entity Linking System for Biomedical Text

Title Bio-YODIE: A Named Entity Linking System for Biomedical Text
Authors Genevieve Gorrell, Xingyi Song, Angus Roberts
Abstract Ever-expanding volumes of biomedical text require automated semantic annotation techniques to curate and put to best use. An established field of research seeks to link mentions in text to knowledge bases such as those included in the UMLS (Unified Medical Language System), in order to enable a more sophisticated understanding. This work has yielded good results for tasks such as curating literature, but increasingly, annotation systems are more broadly applied. Medical vocabularies are expanding in size, and with them the extent of term ambiguity. Document collections are increasing in size and complexity, creating a greater need for speed and robustness. Furthermore, as the technologies are turned to new tasks, requirements change; for example greater coverage of expressions may be required in order to annotate patient records, and greater accuracy may be needed for applications that affect patients. This places new demands on the approaches currently in use. In this work, we present a new system, Bio-YODIE, and compare it to two other popular systems in order to give guidance about suitable approaches in different scenarios and how systems might be designed to accommodate future needs.
Tasks Entity Linking
Published 2018-11-12
URL http://arxiv.org/abs/1811.04860v1
PDF http://arxiv.org/pdf/1811.04860v1.pdf
PWC https://paperswithcode.com/paper/bio-yodie-a-named-entity-linking-system-for
Repo https://github.com/GateNLP/Bio-YODIE
Framework none

Dynamic Graph CNN for Learning on Point Clouds

Title Dynamic Graph CNN for Learning on Point Clouds
Authors Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, Justin M. Solomon
Abstract Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks including ModelNet40, ShapeNetPart, and S3DIS.
Tasks
Published 2018-01-24
URL https://arxiv.org/abs/1801.07829v2
PDF https://arxiv.org/pdf/1801.07829v2.pdf
PWC https://paperswithcode.com/paper/dynamic-graph-cnn-for-learning-on-point
Repo https://github.com/AnTao97/UnsupervisedPointCloudReconstruction
Framework pytorch
comments powered by Disqus