Paper Group NANR 98
Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts. Small Steps and Giant Leaps: Minimal Newton Solvers for Deep Learning. Learning Latent Semantic Representation from Pre-defined Generative Model. ACCELERATING NONCONVEX LEARNING VIA REPLICA EXCHANGE LANGEVIN DIFFUSION. Cardiff University at SemEval-20 …
Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts
Title | Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts |
Authors | Luke Breitfeller, Emily Ahn, David Jurgens, Yulia Tsvetkov |
Abstract | Microaggressions are subtle, often veiled, manifestations of human biases. These uncivil interactions can have a powerful negative impact on people by marginalizing minorities and disadvantaged groups. The linguistic subtlety of microaggressions in communication has made it difficult for researchers to analyze their exact nature, and to quantify and extract microaggressions automatically. Specifically, the lack of a corpus of real-world microaggressions and objective criteria for annotating them have prevented researchers from addressing these problems at scale. In this paper, we devise a general but nuanced, computationally operationalizable typology of microaggressions based on a small subset of data that we have. We then create two datasets: one with examples of diverse types of microaggressions recollected by their targets, and another with gender-based microaggressions in public conversations on social media. We introduce a new, more objective, criterion for annotation and an active-learning based procedure that increases the likelihood of surfacing posts containing microaggressions. Finally, we analyze the trends that emerge from these new datasets. |
Tasks | Active Learning |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1176/ |
https://www.aclweb.org/anthology/D19-1176 | |
PWC | https://paperswithcode.com/paper/finding-microaggressions-in-the-wild-a-case |
Repo | |
Framework | |
Small Steps and Giant Leaps: Minimal Newton Solvers for Deep Learning
Title | Small Steps and Giant Leaps: Minimal Newton Solvers for Deep Learning |
Authors | Joao F. Henriques, Sebastien Ehrhardt, Samuel Albanie, Andrea Vedaldi |
Abstract | We propose a fast second-order method that can be used as a drop-in replacement for current deep learning solvers. Compared to stochastic gradient descent (SGD), it only requires two additional forward-mode automatic differentiation operations per iteration, which has a computational cost comparable to two standard forward passes and is easy to implement. Our method addresses long-standing issues with current second-order solvers, which invert an approximate Hessian matrix every iteration exactly or by conjugate-gradient methods, procedures that are much slower than a SGD step. Instead, we propose to keep a single estimate of the gradient projected by the inverse Hessian matrix, and update it once per iteration with just two passes over the network. This estimate has the same size and is similar to the momentum variable that is commonly used in SGD . No estimate of the Hessian is maintained. We first validate our method, called CurveBall, on small problems with known solutions (noisy Rosenbrock function and degenerate 2-layer linear networks), where current deep learning solvers struggle. We then train several large models on CIFAR and ImageNet, including ResNet and VGG-f networks, where we demonstrate faster convergence with no hyperparameter tuning. We also show our optimiser’s generality by testing on a large set of randomly generated architectures. |
Tasks | |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Henriques_Small_Steps_and_Giant_Leaps_Minimal_Newton_Solvers_for_Deep_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Henriques_Small_Steps_and_Giant_Leaps_Minimal_Newton_Solvers_for_Deep_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/small-steps-and-giant-leaps-minimal-newton-2 |
Repo | |
Framework | |
Learning Latent Semantic Representation from Pre-defined Generative Model
Title | Learning Latent Semantic Representation from Pre-defined Generative Model |
Authors | Jin-Young Kim, Sung-Bae Cho |
Abstract | Learning representations of data is an important issue in machine learning. Though GAN has led to significant improvements in the data representations, it still has several problems such as unstable training, hidden manifold of data, and huge computational overhead. GAN tends to produce the data simply without any information about the manifold of the data, which hinders from controlling desired features to generate. Moreover, most of GAN’s have a large size of manifold, resulting in poor scalability. In this paper, we propose a novel GAN to control the latent semantic representation, called LSC-GAN, which allows us to produce desired data to generate and learns a representation of the data efficiently. Unlike the conventional GAN models with hidden distribution of latent space, we define the distributions explicitly in advance that are trained to generate the data based on the corresponding features by inputting the latent variables that follow the distribution. As the larger scale of latent space caused by deploying various distributions in one latent space makes training unstable while maintaining the dimension of latent space, we need to separate the process of defining the distributions explicitly and operation of generation. We prove that a VAE is proper for the former and modify a loss function of VAE to map the data into the pre-defined latent space so as to locate the reconstructed data as close to the input data according to its characteristics. Moreover, we add the KL divergence to the loss function of LSC-GAN to include this process. The decoder of VAE, which generates the data with the corresponding features from the pre-defined latent space, is used as the generator of the LSC-GAN. Several experiments on the CelebA dataset are conducted to verify the usefulness of the proposed method to generate desired data stably and efficiently, achieving a high compression ratio that can hold about 24 pixels of information in each dimension of latent space. Besides, our model learns the reverse of features such as not laughing (rather frowning) only with data of ordinary and smiling facial expression. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=Hyg1Ls0cKQ |
https://openreview.net/pdf?id=Hyg1Ls0cKQ | |
PWC | https://paperswithcode.com/paper/learning-latent-semantic-representation-from |
Repo | |
Framework | |
ACCELERATING NONCONVEX LEARNING VIA REPLICA EXCHANGE LANGEVIN DIFFUSION
Title | ACCELERATING NONCONVEX LEARNING VIA REPLICA EXCHANGE LANGEVIN DIFFUSION |
Authors | Yi Chen, Jinglin Chen, Jing Dong, Jian Peng, Zhaoran Wang |
Abstract | Langevin diffusion is a powerful method for nonconvex optimization, which enables the escape from local minima by injecting noise into the gradient. In particular, the temperature parameter controlling the noise level gives rise to a tradeoff between global exploration'' and local exploitation’', which correspond to high and low temperatures. To attain the advantages of both regimes, we propose to use replica exchange, which swaps between two Langevin diffusions with different temperatures. We theoretically analyze the acceleration effect of replica exchange from two perspectives: (i) the convergence in $\chi^2$-divergence, and (ii) the large deviation principle. Such an acceleration effect allows us to faster approach the global minima. Furthermore, by discretizing the replica exchange Langevin diffusion, we obtain a discrete-time algorithm. For such an algorithm, we quantify its discretization error in theory and demonstrate its acceleration effect in practice. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SJfPFjA9Fm |
https://openreview.net/pdf?id=SJfPFjA9Fm | |
PWC | https://paperswithcode.com/paper/accelerating-nonconvex-learning-via-replica |
Repo | |
Framework | |
Cardiff University at SemEval-2019 Task 4: Linguistic Features for Hyperpartisan News Detection
Title | Cardiff University at SemEval-2019 Task 4: Linguistic Features for Hyperpartisan News Detection |
Authors | Carla P{'e}rez-Almendros, Luis Espinosa-Anke, Steven Schockaert |
Abstract | This paper summarizes our contribution to the Hyperpartisan News Detection task in SemEval 2019. We experiment with two different approaches: 1) an SVM classifier based on word vector averages and hand-crafted linguistic features, and 2) a BiLSTM-based neural text classifier trained on a filtered training set. Surprisingly, despite their different nature, both approaches achieve an accuracy of 0.74. The main focus of this paper is to further analyze the remarkable fact that a simple feature-based approach can perform on par with modern neural classifiers. We also highlight the effectiveness of our filtering strategy for training the neural network on a large but noisy training set. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2158/ |
https://www.aclweb.org/anthology/S19-2158 | |
PWC | https://paperswithcode.com/paper/cardiff-university-at-semeval-2019-task-4 |
Repo | |
Framework | |
AiFu at SemEval-2019 Task 10: A Symbolic and Sub-symbolic Integrated System for SAT Math Question Answering
Title | AiFu at SemEval-2019 Task 10: A Symbolic and Sub-symbolic Integrated System for SAT Math Question Answering |
Authors | Yifan Liu, Keyu Ding, Yi Zhou |
Abstract | AiFu has won the first place in the SemEval-2019 Task 10 - {''}Math Question Answering{''}competition. This paper is to describe how it works technically and to report and analyze some essential experimental results |
Tasks | Question Answering |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2154/ |
https://www.aclweb.org/anthology/S19-2154 | |
PWC | https://paperswithcode.com/paper/aifu-at-semeval-2019-task-10-a-symbolic-and |
Repo | |
Framework | |
End-to-end learning of pharmacological assays from high-resolution microscopy images
Title | End-to-end learning of pharmacological assays from high-resolution microscopy images |
Authors | Markus Hofmarcher, Elisabeth Rumetshofer, Sepp Hochreiter, Günter Klambauer |
Abstract | Predicting the outcome of pharmacological assays based on high-resolution microscopy images of treated cells is a crucial task in drug discovery which tremendously increases discovery rates. However, end-to-end learning on these images with convolutional neural networks (CNNs) has not been ventured for this task because it has been considered infeasible and overly complex. On the largest available public dataset, we compare several state-of-the-art CNNs trained in an end-to-end fashion with models based on a cell-centric approach involving segmentation. We found that CNNs operating on full images containing hundreds of cells perform significantly better at assay prediction than networks operating on a single-cell level. Surprisingly, we could predict 29% of the 209 pharmacological assays at high predictive performance (AUC > 0.9). We compared a novel CNN architecture called “GapNet” against four competing CNN architectures and found that it performs on par with the best methods and at the same time has the lowest training time. Our results demonstrate that end-to-end learning on high-resolution imaging data is not only possible but even outperforms cell-centric and segmentation-dependent approaches. Hence, the costly cell segmentation and feature extraction steps are not necessary, in fact they even hamper predictive performance. Our work further suggests that many pharmacological assays could be replaced by high-resolution microscopy imaging together with convolutional neural networks. |
Tasks | Cell Segmentation, Drug Discovery |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=S1gBgnR9Y7 |
https://openreview.net/pdf?id=S1gBgnR9Y7 | |
PWC | https://paperswithcode.com/paper/end-to-end-learning-of-pharmacological-assays |
Repo | |
Framework | |
Cross-Entropy Loss Leads To Poor Margins
Title | Cross-Entropy Loss Leads To Poor Margins |
Authors | Kamil Nar, Orhan Ocal, S. Shankar Sastry, Kannan Ramchandran |
Abstract | Neural networks could misclassify inputs that are slightly different from their training data, which indicates a small margin between their decision boundaries and the training dataset. In this work, we study the binary classification of linearly separable datasets and show that linear classifiers could also have decision boundaries that lie close to their training dataset if cross-entropy loss is used for training. In particular, we show that if the features of the training dataset lie in a low-dimensional affine subspace and the cross-entropy loss is minimized by using a gradient method, the margin between the training points and the decision boundary could be much smaller than the optimal value. This result is contrary to the conclusions of recent related works such as (Soudry et al., 2018), and we identify the reason for this contradiction. In order to improve the margin, we introduce differential training, which is a training paradigm that uses a loss function defined on pairs of points from each class. We show that the decision boundary of a linear classifier trained with differential training indeed achieves the maximum margin. The results reveal the use of cross-entropy loss as one of the hidden culprits of adversarial examples and introduces a new direction to make neural networks robust against them. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=ByfbnsA9Km |
https://openreview.net/pdf?id=ByfbnsA9Km | |
PWC | https://paperswithcode.com/paper/cross-entropy-loss-leads-to-poor-margins |
Repo | |
Framework | |
Risk Factors Extraction from Clinical Texts based on Linked Open Data
Title | Risk Factors Extraction from Clinical Texts based on Linked Open Data |
Authors | Svetla Boytcheva, Galia Angelova, Zhivko Angelov |
Abstract | This paper presents experiments in risk factors analysis based on clinical texts enhanced with Linked Open Data (LOD). The idea is to determine whether a patient has risk factors for a specific disease analyzing only his/her outpatient records. A semantic graph of {``}meta-knowledge{''} about a disease of interest is constructed, with integrated multilingual terms (labels) of symptoms, risk factors etc. coming from Wikidata, PubMed, Wikipedia and MESH, and linked to clinical records of individual patients via ICD{–}10 codes. Then a predictive model is trained to foretell whether patients are at risk to develop the disease of interest. The testing was done using outpatient records from a nation-wide repository available for the period 2011-2016. The results show improvement of the overall performance of all tested algorithms (kNN, Naive Bayes, Tree, Logistic regression, ANN), when the clinical texts are enriched with LOD resources. | |
Tasks | |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1019/ |
https://www.aclweb.org/anthology/R19-1019 | |
PWC | https://paperswithcode.com/paper/risk-factors-extraction-from-clinical-texts |
Repo | |
Framework | |
Predicting Humorousness and Metaphor Novelty with Gaussian Process Preference Learning
Title | Predicting Humorousness and Metaphor Novelty with Gaussian Process Preference Learning |
Authors | Edwin Simpson, Erik-L{^a}n Do Dinh, Tristan Miller, Iryna Gurevych |
Abstract | The inability to quantify key aspects of creative language is a frequent obstacle to natural language understanding. To address this, we introduce novel tasks for evaluating the creativeness of language{—}namely, scoring and ranking text by humorousness and metaphor novelty. To sidestep the difficulty of assigning discrete labels or numeric scores, we learn from pairwise comparisons between texts. We introduce a Bayesian approach for predicting humorousness and metaphor novelty using Gaussian process preference learning (GPPL), which achieves a Spearman{'}s Ï� of 0.56 against gold using word embeddings and linguistic features. Our experiments show that given sparse, crowdsourced annotation data, ranking using GPPL outperforms best{–}worst scaling. We release a new dataset for evaluating humour containing 28,210 pairwise comparisons of 4,030 texts, and make our software freely available. |
Tasks | Word Embeddings |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1572/ |
https://www.aclweb.org/anthology/P19-1572 | |
PWC | https://paperswithcode.com/paper/predicting-humorousness-and-metaphor-novelty |
Repo | |
Framework | |
Empirical Linguistic Study of Sentence Embeddings
Title | Empirical Linguistic Study of Sentence Embeddings |
Authors | Katarzyna Krasnowska-Kiera{'s}, Alina Wr{'o}blewska |
Abstract | The purpose of the research is to answer the question whether linguistic information is retained in vector representations of sentences. We introduce a method of analysing the content of sentence embeddings based on universal probing tasks, along with the classification datasets for two contrasting languages. We perform a series of probing and downstream experiments with different types of sentence embeddings, followed by a thorough analysis of the experimental results. Aside from dependency parser-based embeddings, linguistic information is retained best in the recently proposed LASER sentence embeddings. |
Tasks | Sentence Embeddings |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1573/ |
https://www.aclweb.org/anthology/P19-1573 | |
PWC | https://paperswithcode.com/paper/empirical-linguistic-study-of-sentence |
Repo | |
Framework | |
Exploring Numeracy in Word Embeddings
Title | Exploring Numeracy in Word Embeddings |
Authors | Aakanksha Naik, Ravich, Abhilasha er, Carolyn Rose, Eduard Hovy |
Abstract | Word embeddings are now pervasive across NLP subfields as the de-facto method of forming text representataions. In this work, we show that existing embedding models are inadequate at constructing representations that capture salient aspects of mathematical meaning for numbers, which is important for language understanding. Numbers are ubiquitous and frequently appear in text. Inspired by cognitive studies on how humans perceive numbers, we develop an analysis framework to test how well word embeddings capture two essential properties of numbers: magnitude (e.g. 3{\textless}4) and numeration (e.g. 3=three). Our experiments reveal that most models capture an approximate notion of magnitude, but are inadequate at capturing numeration. We hope that our observations provide a starting point for the development of methods which better capture numeracy in NLP systems. |
Tasks | Word Embeddings |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1329/ |
https://www.aclweb.org/anthology/P19-1329 | |
PWC | https://paperswithcode.com/paper/exploring-numeracy-in-word-embeddings |
Repo | |
Framework | |
EED: Extended Edit Distance Measure for Machine Translation
Title | EED: Extended Edit Distance Measure for Machine Translation |
Authors | Peter Stanchev, Weiyue Wang, Hermann Ney |
Abstract | Over the years a number of machine translation metrics have been developed in order to evaluate the accuracy and quality of machine-generated translations. Metrics such as BLEU and TER have been used for decades. However, with the rapid progress of machine translation systems, the need for better metrics is growing. This paper proposes an extension of the edit distance, which achieves better human correlation, whilst remaining fast, flexible and easy to understand. |
Tasks | Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5359/ |
https://www.aclweb.org/anthology/W19-5359 | |
PWC | https://paperswithcode.com/paper/eed-extended-edit-distance-measure-for |
Repo | |
Framework | |
Filtering Pseudo-References by Paraphrasing for Automatic Evaluation of Machine Translation
Title | Filtering Pseudo-References by Paraphrasing for Automatic Evaluation of Machine Translation |
Authors | Ryoma Yoshimura, Hiroki Shimanaka, Yukio Matsumura, Hayahide Yamagishi, Mamoru Komachi |
Abstract | In this paper, we introduce our participation in the WMT 2019 Metric Shared Task. We propose an improved version of sentence BLEU using filtered pseudo-references. We propose a method to filter pseudo-references by paraphrasing for automatic evaluation of machine translation (MT). We use the outputs of off-the-shelf MT systems as pseudo-references filtered by paraphrasing in addition to a single human reference (gold reference). We use BERT fine-tuned with paraphrase corpus to filter pseudo-references by checking the paraphrasability with the gold reference. Our experimental results of the WMT 2016 and 2017 datasets show that our method achieved higher correlation with human evaluation than the sentence BLEU (SentBLEU) baselines with a single reference and with unfiltered pseudo-references. |
Tasks | Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5360/ |
https://www.aclweb.org/anthology/W19-5360 | |
PWC | https://paperswithcode.com/paper/filtering-pseudo-references-by-paraphrasing |
Repo | |
Framework | |
l-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement
Title | l-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement |
Authors | Xin Miao, Xin Yuan, Yunchen Pu, Vassilis Athitsos |
Abstract | We propose the l-net, which reconstructs hyperspectral images (e.g., with 24 spectral channels) from a single shot measurement. This task is usually termed snapshot compressive-spectral imaging (SCI), which enjoys low cost, low bandwidth and high-speed sensing rate via capturing the three-dimensional (3D) signal i.e., (x, y, l), using a 2D snapshot. Though proposed more than a decade ago, the poor quality and low-speed of reconstruction algorithms preclude wide applications of SCI. To address this challenge, in this paper, we develop a dual-stage generative model to reconstruct the desired 3D signal in SCI, dubbed l-net. Results on both simulation and real datasets demonstrate the significant advantages of l-net, which leads to >4dB improvement in PSNR for real-mask-in-the-loop simulation data compared to the current state-of-the-art. Furthermore, l-net can finish the reconstruction task within sub-seconds instead of hours taken by the most recently proposed DeSCI algorithm, thus speeding up the reconstruction >1000 times. |
Tasks | |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Miao_l-Net_Reconstruct_Hyperspectral_Images_From_a_Snapshot_Measurement_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Miao_l-Net_Reconstruct_Hyperspectral_Images_From_a_Snapshot_Measurement_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/l-net-reconstruct-hyperspectral-images-from-a |
Repo | |
Framework | |