October 21, 2019

3038 words 15 mins read

Paper Group AWR 36

Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media. Kernel Exponential Family Estimation via Doubly Dual Embedding. Compact Generalized Non-local Network. Multi-Task Graph Autoencoders. HyperAdam: A Learnable Task-Adaptive Adam for Network Training. Towards Exploiting Background Knowledge for Building Conversation Systems …


Title	Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media
Authors	Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, Elizabeth Belding
Abstract	While social media empowers freedom of expression and individual voices, it also enables anti-social behavior, online harassment, cyberbullying, and hate speech. In this paper, we deepen our understanding of online hate speech by focusing on a largely neglected but crucial aspect of hate speech – its target: either “directed” towards a specific person or entity, or “generalized” towards a group of people sharing a common protected characteristic. We perform the first linguistic and psycholinguistic analysis of these two forms of hate speech and reveal the presence of interesting markers that distinguish these types of hate speech. Our analysis reveals that Directed hate speech, in addition to being more personal and directed, is more informal, angrier, and often explicitly attacks the target (via name calling) with fewer analytic words and more words suggesting authority and influence. Generalized hate speech, on the other hand, is dominated by religious hate, is characterized by the use of lethal words such as murder, exterminate, and kill; and quantity words such as million and many. Altogether, our work provides a data-driven analysis of the nuances of online-hate speech that enables not only a deepened understanding of hate speech and its social implications but also its detection.
Tasks
Published	2018-04-11
URL	http://arxiv.org/abs/1804.04257v1
PDF	http://arxiv.org/pdf/1804.04257v1.pdf
PWC	https://paperswithcode.com/paper/hate-lingo-a-target-based-linguistic-analysis
Repo	https://github.com/ben-aaron188/ucl_aca_20182019
Framework	none

Kernel Exponential Family Estimation via Doubly Dual Embedding


Title	Kernel Exponential Family Estimation via Doubly Dual Embedding
Authors	Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He
Abstract	We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space. Key to our approach is a novel technique, doubly dual embedding, that avoids computation of the partition function. This technique also allows the development of a flexible sampling strategy that amortizes the cost of Monte-Carlo sampling in the inference stage. The resulting estimator can be easily generalized to kernel conditional exponential families. We establish a connection between kernel exponential family estimation and MMD-GANs, revealing a new perspective for understanding GANs. Compared to the score matching based estimators, the proposed method improves both memory and time efficiency while enjoying stronger statistical properties, such as fully capturing smoothness in its statistical convergence rate while the score matching estimator appears to saturate. Finally, we show that the proposed estimator empirically outperforms state-of-the-art
Tasks
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02228v3
PDF	http://arxiv.org/pdf/1811.02228v3.pdf
PWC	https://paperswithcode.com/paper/kernel-exponential-family-estimation-via
Repo	https://github.com/Hanjun-Dai/dde
Framework	pytorch

Compact Generalized Non-local Network


Title	Compact Generalized Non-local Network
Authors	Kaiyu Yue, Ming Sun, Yuchen Yuan, Feng Zhou, Errui Ding, Fuxin Xu
Abstract	The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos. Although having shown excellent performance, it lacks the mechanism to model the interactions between positions across channels, which are of vital importance in recognizing fine-grained objects and actions. To address this limitation, we generalize the non-local module and take the correlations between the positions of any two channels into account. This extension utilizes the compact representation for multiple kernel functions with Taylor expansion that makes the generalized non-local module in a fast and low-complexity computation flow. Moreover, we implement our generalized non-local method within channel groups to ease the optimization. Experimental results illustrate the clear-cut improvements and practical applicability of the generalized non-local module on both fine-grained object recognition and video classification. Code is available at: https://github.com/KaiyuYue/cgnl-network.pytorch.
Tasks	Object Detection, Object Recognition, Video Classification
Published	2018-10-31
URL	http://arxiv.org/abs/1810.13125v2
PDF	http://arxiv.org/pdf/1810.13125v2.pdf
PWC	https://paperswithcode.com/paper/compact-generalized-non-local-network
Repo	https://github.com/KaiyuYue/cgnl-network.pytorch
Framework	pytorch

Multi-Task Graph Autoencoders


Title	Multi-Task Graph Autoencoders
Authors	Phi Vu Tran
Abstract	We examine two fundamental tasks associated with graph representation learning: link prediction and node classification. We present a new autoencoder architecture capable of learning a joint representation of local graph structure and available node features for the simultaneous multi-task learning of unsupervised link prediction and semi-supervised node classification. Our simple, yet effective and versatile model is efficiently trained end-to-end in a single stage, whereas previous related deep graph embedding methods require multiple training steps that are difficult to optimize. We provide an empirical evaluation of our model on five benchmark relational, graph-structured datasets and demonstrate significant improvement over three strong baselines for graph representation learning. Reference code and data are available at https://github.com/vuptran/graph-representation-learning
Tasks	Graph Embedding, Graph Representation Learning, Link Prediction, Multi-Task Learning, Node Classification, Representation Learning
Published	2018-11-07
URL	http://arxiv.org/abs/1811.02798v1
PDF	http://arxiv.org/pdf/1811.02798v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-graph-autoencoders
Repo	https://github.com/vuptran/graph-representation-learning
Framework	tf

HyperAdam: A Learnable Task-Adaptive Adam for Network Training


Title	HyperAdam: A Learnable Task-Adaptive Adam for Network Training
Authors	Shipeng Wang, Jian Sun, Zongben Xu
Abstract	Deep neural networks are traditionally trained using human-designed stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as \textit{HyperAdam}, is proposed that combines the idea of “learning to optimize” and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates. The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.
Tasks	Stochastic Optimization
Published	2018-11-22
URL	http://arxiv.org/abs/1811.08996v1
PDF	http://arxiv.org/pdf/1811.08996v1.pdf
PWC	https://paperswithcode.com/paper/hyperadam-a-learnable-task-adaptive-adam-for
Repo	https://github.com/ShipengWang/HyperAdam-Tensorflow
Framework	tf

Towards Exploiting Background Knowledge for Building Conversation Systems


Title	Towards Exploiting Background Knowledge for Building Conversation Systems
Authors	Nikita Moghe, Siddhartha Arora, Suman Banerjee, Mitesh M. Khapra
Abstract	Existing dialog datasets contain a sequence of utterances and responses without any explicit background knowledge associated with them. This has resulted in the development of models which treat conversation as a sequence-to-sequence generation task i.e, given a sequence of utterances generate the response sequence). This is not only an overly simplistic view of conversation but it is also emphatically different from the way humans converse by heavily relying on their background knowledge about the topic (as opposed to simply relying on the previous sequence of utterances). For example, it is common for humans to (involuntarily) produce utterances which are copied or suitably modified from background articles they have read about the topic. To facilitate the development of such natural conversation models which mimic the human process of conversing, we create a new dataset containing movie chats wherein each response is explicitly generated by copying and/or modifying sentences from unstructured background knowledge such as plots, comments and reviews about the movie. We establish baseline results on this dataset (90K utterances from 9K conversations) using three different models: (i) pure generation based models which ignore the background knowledge (ii) generation based models which learn to copy information from the background knowledge when required and (iii) span prediction based models which predict the appropriate response span in the background knowledge.
Tasks
Published	2018-09-21
URL	http://arxiv.org/abs/1809.08205v1
PDF	http://arxiv.org/pdf/1809.08205v1.pdf
PWC	https://paperswithcode.com/paper/towards-exploiting-background-knowledge-for
Repo	https://github.com/nikitacs16/Holl-E
Framework	none

Learning Global Additive Explanations for Neural Nets Using Model Distillation


Title	Learning Global Additive Explanations for Neural Nets Using Model Distillation
Authors	Sarah Tan, Rich Caruana, Giles Hooker, Paul Koch, Albert Gordo
Abstract	Interpretability has largely focused on local explanations, i.e. explaining why a model made a particular prediction for a sample. These explanations are appealing due to their simplicity and local fidelity. However, they do not provide information about the general behavior of the model. We propose to leverage model distillation to learn global additive explanations that describe the relationship between input features and model predictions. These global explanations take the form of feature shapes, which are more expressive than feature attributions. Through careful experimentation, we show qualitatively and quantitatively that global additive explanations are able to describe model behavior and yield insights about models such as neural nets. A visualization of our approach applied to a neural net as it is trained is available at https://youtu.be/ErQYwNqzEdc.
Tasks
Published	2018-01-26
URL	http://arxiv.org/abs/1801.08640v2
PDF	http://arxiv.org/pdf/1801.08640v2.pdf
PWC	https://paperswithcode.com/paper/learning-global-additive-explanations-for
Repo	https://github.com/aclarkData/MachineLearningInterpretability
Framework	none

Can recurrent neural networks warp time?


Title	Can recurrent neural networks warp time?
Authors	Corentin Tallec, Yann Ollivier
Abstract	Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms. Empirically these models have been found to improve the learning of medium to long term temporal dependencies and to help with vanishing gradient issues. We prove that learnable gates in a recurrent model formally provide quasi- invariance to general time transformations in the input data. We recover part of the LSTM architecture from a simple axiomatic approach. This result leads to a new way of initializing gate biases in LSTMs and GRUs. Ex- perimentally, this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort.
Tasks
Published	2018-03-23
URL	http://arxiv.org/abs/1804.11188v1
PDF	http://arxiv.org/pdf/1804.11188v1.pdf
PWC	https://paperswithcode.com/paper/can-recurrent-neural-networks-warp-time
Repo	https://github.com/AravindGanesh/ChronoLSTM
Framework	none

Deep Uncertainty Quantification: A Machine Learning Approach for Weather Forecasting


Title	Deep Uncertainty Quantification: A Machine Learning Approach for Weather Forecasting
Authors	Bin Wang, Jie Lu, Zheng Yan, Huaishao Luo, Tianrui Li, Yu Zheng, Guangquan Zhang
Abstract	Weather forecasting is usually solved through numerical weather prediction (NWP), which can sometimes lead to unsatisfactory performance due to inappropriate setting of the initial states. In this paper, we design a data-driven method augmented by an effective information fusion mechanism to learn from historical data that incorporates prior knowledge from NWP. We cast the weather forecasting problem as an end-to-end deep learning problem and solve it by proposing a novel negative log-likelihood error (NLE) loss function. A notable advantage of our proposed method is that it simultaneously implements single-value forecasting and uncertainty quantification, which we refer to as deep uncertainty quantification (DUQ). Efficient deep ensemble strategies are also explored to further improve performance. This new approach was evaluated on a public dataset collected from weather stations in Beijing, China. Experimental results demonstrate that the proposed NLE loss significantly improves generalization compared to mean squared error (MSE) loss and mean absolute error (MAE) loss. Compared with NWP, this approach significantly improves accuracy by 47.76%, which is a state-of-the-art result on this benchmark dataset. The preliminary version of the proposed method won 2nd place in an online competition for daily weather forecasting.
Tasks	Weather Forecasting
Published	2018-12-22
URL	http://arxiv.org/abs/1812.09467v3
PDF	http://arxiv.org/pdf/1812.09467v3.pdf
PWC	https://paperswithcode.com/paper/deep-uncertainty-quantification-a-machine
Repo	https://github.com/BruceBinBoxing/WF
Framework	tf

Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT


Title	Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT
Authors	Jing Yang, Biao Zhang, Yue Qin, Xiangwen Zhang, Qian Lin, Jinsong Su
Abstract	Although neural machine translation(NMT) yields promising translation performance, it unfortunately suffers from over- and under-translation is- sues [Tu et al., 2016], of which studies have become research hotspots in NMT. At present, these studies mainly apply the dominant automatic evaluation metrics, such as BLEU, to evaluate the overall translation quality with respect to both adequacy and uency. However, they are unable to accurately measure the ability of NMT systems in dealing with the above-mentioned issues. In this paper, we propose two quantitative metrics, the Otem and Utem, to automatically evaluate the system perfor- mance in terms of over- and under-translation respectively. Both metrics are based on the proportion of mismatched n-grams between gold ref- erence and system translation. We evaluate both metrics by comparing their scores with human evaluations, where the values of Pearson Cor- relation Coefficient reveal their strong correlation. Moreover, in-depth analyses on various translation systems indicate some inconsistency be- tween BLEU and our proposed metrics, highlighting the necessity and significance of our metrics.
Tasks	Machine Translation
Published	2018-07-24
URL	http://arxiv.org/abs/1807.08945v1
PDF	http://arxiv.org/pdf/1807.08945v1.pdf
PWC	https://paperswithcode.com/paper/otemutem-over-and-under-translation
Repo	https://github.com/DeepLearnXMU/Otem-Utem
Framework	none

U-Net: Machine Reading Comprehension with Unanswerable Questions


Title	U-Net: Machine Reading Comprehension with Unanswerable Questions
Authors	Fu Sun, Linyang Li, Xipeng Qiu, Yang Liu
Abstract	Machine reading comprehension with unanswerable questions is a new challenging task for natural language processing. A key subtask is to reliably predict whether the question is unanswerable. In this paper, we propose a unified model, called U-Net, with three important components: answer pointer, no-answer pointer, and answer verifier. We introduce a universal node and thus process the question and its context passage as a single contiguous sequence of tokens. The universal node encodes the fused information from both the question and passage, and plays an important role to predict whether the question is answerable and also greatly improves the conciseness of the U-Net. Different from the state-of-art pipeline models, U-Net can be learned in an end-to-end fashion. The experimental results on the SQuAD 2.0 dataset show that U-Net can effectively predict the unanswerability of questions and achieves an F1 score of 71.7 on SQuAD 2.0.
Tasks	Machine Reading Comprehension, Question Answering, Reading Comprehension
Published	2018-10-12
URL	http://arxiv.org/abs/1810.06638v1
PDF	http://arxiv.org/pdf/1810.06638v1.pdf
PWC	https://paperswithcode.com/paper/u-net-machine-reading-comprehension-with
Repo	https://github.com/FudanNLP/UNet
Framework	pytorch

INFODENS: An Open-source Framework for Learning Text Representations


Title	INFODENS: An Open-source Framework for Learning Text Representations
Authors	Ahmad Taie, Raphael Rubino, Josef van Genabith
Abstract	The advent of representation learning methods enabled large performance gains on various language tasks, alleviating the need for manual feature engineering. While engineered representations are usually based on some linguistic understanding and are therefore more interpretable, learned representations are harder to interpret. Empirically studying the complementarity of both approaches can provide more linguistic insights that would help reach a better compromise between interpretability and performance. We present INFODENS, a framework for studying learned and engineered representations of text in the context of text classification tasks. It is designed to simplify the tasks of feature engineering as well as provide the groundwork for extracting learned features and combining both approaches. INFODENS is flexible, extensible, with a short learning curve, and is easy to integrate with many of the available and widely used natural language processing tools.
Tasks	Feature Engineering, Representation Learning, Text Classification
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07091v1
PDF	http://arxiv.org/pdf/1810.07091v1.pdf
PWC	https://paperswithcode.com/paper/infodens-an-open-source-framework-for
Repo	https://github.com/ahmad-taie/infodens
Framework	tf

Spectral Inference Networks: Unifying Deep and Spectral Learning


Title	Spectral Inference Networks: Unifying Deep and Spectral Learning
Authors	David Pfau, Stig Petersen, Ashish Agarwal, David G. T. Barrett, Kimberly L. Stachenfeld
Abstract	We present Spectral Inference Networks, a framework for learning eigenfunctions of linear operators by stochastic optimization. Spectral Inference Networks generalize Slow Feature Analysis to generic symmetric operators, and are closely related to Variational Monte Carlo methods from computational physics. As such, they can be a powerful tool for unsupervised representation learning from video or graph-structured data. We cast training Spectral Inference Networks as a bilevel optimization problem, which allows for online learning of multiple eigenfunctions. We show results of training Spectral Inference Networks on problems in quantum mechanics and feature learning for videos on synthetic datasets. Our results demonstrate that Spectral Inference Networks accurately recover eigenfunctions of linear operators and can discover interpretable representations from video in a fully unsupervised manner.
Tasks	Atari Games, bilevel optimization, Representation Learning, Stochastic Optimization, Unsupervised Representation Learning
Published	2018-06-06
URL	https://arxiv.org/abs/1806.02215v3
PDF	https://arxiv.org/pdf/1806.02215v3.pdf
PWC	https://paperswithcode.com/paper/spectral-inference-networks-unifying-spectral
Repo	https://github.com/deepmind/spectral_inference_networks
Framework	tf

Bio-YODIE: A Named Entity Linking System for Biomedical Text


Title	Bio-YODIE: A Named Entity Linking System for Biomedical Text
Authors	Genevieve Gorrell, Xingyi Song, Angus Roberts
Abstract	Ever-expanding volumes of biomedical text require automated semantic annotation techniques to curate and put to best use. An established field of research seeks to link mentions in text to knowledge bases such as those included in the UMLS (Unified Medical Language System), in order to enable a more sophisticated understanding. This work has yielded good results for tasks such as curating literature, but increasingly, annotation systems are more broadly applied. Medical vocabularies are expanding in size, and with them the extent of term ambiguity. Document collections are increasing in size and complexity, creating a greater need for speed and robustness. Furthermore, as the technologies are turned to new tasks, requirements change; for example greater coverage of expressions may be required in order to annotate patient records, and greater accuracy may be needed for applications that affect patients. This places new demands on the approaches currently in use. In this work, we present a new system, Bio-YODIE, and compare it to two other popular systems in order to give guidance about suitable approaches in different scenarios and how systems might be designed to accommodate future needs.
Tasks	Entity Linking
Published	2018-11-12
URL	http://arxiv.org/abs/1811.04860v1
PDF	http://arxiv.org/pdf/1811.04860v1.pdf
PWC	https://paperswithcode.com/paper/bio-yodie-a-named-entity-linking-system-for
Repo	https://github.com/GateNLP/Bio-YODIE
Framework	none

Dynamic Graph CNN for Learning on Point Clouds


Title	Dynamic Graph CNN for Learning on Point Clouds
Authors	Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, Justin M. Solomon
Abstract	Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks including ModelNet40, ShapeNetPart, and S3DIS.
Tasks
Published	2018-01-24
URL	https://arxiv.org/abs/1801.07829v2
PDF	https://arxiv.org/pdf/1801.07829v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-graph-cnn-for-learning-on-point
Repo	https://github.com/AnTao97/UnsupervisedPointCloudReconstruction
Framework	pytorch