Paper Group AWR 120
Measuring LDA Topic Stability from Clusters of Replicated Runs. Multilingual Extractive Reading Comprehension by Runtime Machine Translation. Reasoning about Actions and State Changes by Injecting Commonsense Knowledge. What Makes Reading Comprehension Questions Easier?. GuideR: a guided separate-and-conquer rule learning in classification, regress …
Measuring LDA Topic Stability from Clusters of Replicated Runs
Title | Measuring LDA Topic Stability from Clusters of Replicated Runs |
Authors | Mika Mäntylä, Maëlick Claes, Umar Farooq |
Abstract | Background: Unstructured and textual data is increasing rapidly and Latent Dirichlet Allocation (LDA) topic modeling is a popular data analysis methods for it. Past work suggests that instability of LDA topics may lead to systematic errors. Aim: We propose a method that relies on replicated LDA runs, clustering, and providing a stability metric for the topics. Method: We generate k LDA topics and replicate this process n times resulting in nk topics. Then we use K-medioids to cluster the nk topics to k clusters. The k clusters now represent the original LDA topics and we present them like normal LDA topics showing the ten most probable words. For the clusters, we try multiple stability metrics, out of which we recommend Rank-Biased Overlap, showing the stability of the topics inside the clusters. Results: We provide an initial validation where our method is used for 270,000 Mozilla Firefox commit messages with k=20 and n=20. We show how our topic stability metrics are related to the contents of the topics. Conclusions: Advances in text mining enable us to analyze large masses of text in software engineering but non-deterministic algorithms, such as LDA, may lead to unreplicable conclusions. Our approach makes LDA stability transparent and is also complementary rather than alternative to many prior works that focus on LDA parameter tuning. |
Tasks | |
Published | 2018-08-24 |
URL | http://arxiv.org/abs/1808.08098v1 |
http://arxiv.org/pdf/1808.08098v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-lda-topic-stability-from-clusters |
Repo | https://github.com/M3SOulu/Measuring-LDA-Topic-Stability |
Framework | none |
Multilingual Extractive Reading Comprehension by Runtime Machine Translation
Title | Multilingual Extractive Reading Comprehension by Runtime Machine Translation |
Authors | Akari Asai, Akiko Eriguchi, Kazuma Hashimoto, Yoshimasa Tsuruoka |
Abstract | Despite recent work in Reading Comprehension (RC), progress has been mostly limited to English due to the lack of large-scale datasets in other languages. In this work, we introduce the first RC system for languages without RC training data. Given a target language without RC training data and a pivot language with RC training data (e.g. English), our method leverages existing RC resources in the pivot language by combining a competitive RC model in the pivot language with an attentive Neural Machine Translation (NMT) model. We first translate the data from the target to the pivot language, and then obtain an answer using the RC model in the pivot language. Finally, we recover the corresponding answer in the original language using soft-alignment attention scores from the NMT model. We create evaluation sets of RC data in two non-English languages, namely Japanese and French, to evaluate our method. Experimental results on these datasets show that our method significantly outperforms a back-translation baseline of a state-of-the-art product-level machine translation system. |
Tasks | Machine Translation, Reading Comprehension |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03275v2 |
http://arxiv.org/pdf/1809.03275v2.pdf | |
PWC | https://paperswithcode.com/paper/multilingual-extractive-reading-comprehension |
Repo | https://github.com/AkariAsai/extractive_rc_by_runtime_mt |
Framework | pytorch |
Reasoning about Actions and State Changes by Injecting Commonsense Knowledge
Title | Reasoning about Actions and State Changes by Injecting Commonsense Knowledge |
Authors | Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark |
Abstract | Comprehending procedural text, e.g., a paragraph describing photosynthesis, requires modeling actions and the state changes they produce, so that questions about entities at different timepoints can be answered. Although several recent systems have shown impressive progress in this task, their predictions can be globally inconsistent or highly improbable. In this paper, we show how the predicted effects of actions in the context of a paragraph can be improved in two ways: (1) by incorporating global, commonsense constraints (e.g., a non-existent entity cannot be destroyed), and (2) by biasing reading with preferences from large-scale corpora (e.g., trees rarely move). Unlike earlier methods, we treat the problem as a neural structured prediction task, allowing hard and soft constraints to steer the model away from unlikely predictions. We show that the new model significantly outperforms earlier systems on a benchmark dataset for procedural text comprehension (+8% relative gain), and that it also avoids some of the nonsensical predictions that earlier systems make. |
Tasks | Reading Comprehension, Structured Prediction |
Published | 2018-08-29 |
URL | http://arxiv.org/abs/1808.10012v1 |
http://arxiv.org/pdf/1808.10012v1.pdf | |
PWC | https://paperswithcode.com/paper/reasoning-about-actions-and-state-changes-by |
Repo | https://github.com/allenai/propara |
Framework | pytorch |
What Makes Reading Comprehension Questions Easier?
Title | What Makes Reading Comprehension Questions Easier? |
Authors | Saku Sugawara, Kentaro Inui, Satoshi Sekine, Akiko Aizawa |
Abstract | A challenge in creating a dataset for machine reading comprehension (MRC) is to collect questions that require a sophisticated understanding of language to answer beyond using superficial cues. In this work, we investigate what makes questions easier across recent 12 MRC datasets with three question styles (answer extraction, description, and multiple choice). We propose to employ simple heuristics to split each dataset into easy and hard subsets and examine the performance of two baseline models for each of the subsets. We then manually annotate questions sampled from each subset with both validity and requisite reasoning skills to investigate which skills explain the difference between easy and hard questions. From this study, we observed that (i) the baseline performances for the hard subsets remarkably degrade compared to those of entire datasets, (ii) hard questions require knowledge inference and multiple-sentence reasoning in comparison with easy questions, and (iii) multiple-choice questions tend to require a broader range of reasoning skills than answer extraction and description questions. These results suggest that one might overestimate recent advances in MRC. |
Tasks | Machine Reading Comprehension, Reading Comprehension |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09384v1 |
http://arxiv.org/pdf/1808.09384v1.pdf | |
PWC | https://paperswithcode.com/paper/what-makes-reading-comprehension-questions |
Repo | https://github.com/Alab-NII/mrc-heuristics |
Framework | none |
GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings
Title | GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings |
Authors | Marek Sikora, Łukasz Wróbel, Adam Gudyś |
Abstract | This article presents GuideR, a user-guided rule induction algorithm, which overcomes the largest limitation of the existing methods-the lack of the possibility to introduce user’s preferences or domain knowledge to the rule learning process. Automatic selection of attributes and attribute ranges often leads to the situation in which resulting rules do not contain interesting information. We propose an induction algorithm which takes into account user’s requirements. Our method uses the sequential covering approach and is suitable for classification, regression, and survival analysis problems. The effectiveness of the algorithm in all these tasks has been verified experimentally, confirming guided rule induction to be a powerful data analysis tool. |
Tasks | Survival Analysis |
Published | 2018-06-05 |
URL | http://arxiv.org/abs/1806.01579v1 |
http://arxiv.org/pdf/1806.01579v1.pdf | |
PWC | https://paperswithcode.com/paper/guider-a-guided-separate-and-conquer-rule |
Repo | https://github.com/adaa-polsl/GuideR |
Framework | none |
LARNN: Linear Attention Recurrent Neural Network
Title | LARNN: Linear Attention Recurrent Neural Network |
Authors | Guillaume Chevalier |
Abstract | The Linear Attention Recurrent Neural Network (LARNN) is a recurrent attention module derived from the Long Short-Term Memory (LSTM) cell and ideas from the consciousness Recurrent Neural Network (RNN). Yes, it LARNNs. The LARNN uses attention on its past cell state values for a limited window size $k$. The formulas are also derived from the Batch Normalized LSTM (BN-LSTM) cell and the Transformer Network for its Multi-Head Attention Mechanism. The Multi-Head Attention Mechanism is used inside the cell such that it can query its own $k$ past values with the attention window. This has the effect of augmenting the rank of the tensor with the attention mechanism, such that the cell can perform complex queries to question its previous inner memories, which should augment the long short-term effect of the memory. With a clever trick, the LARNN cell with attention can be easily used inside a loop on the cell state, just like how any other Recurrent Neural Network (RNN) cell can be looped linearly through time series. This is due to the fact that its state, which is looped upon throughout time steps within time series, stores the inner states in a “first in, first out” queue which contains the $k$ most recent states and on which it is easily possible to add static positional encoding when the queue is represented as a tensor. This neural architecture yields better results than the vanilla LSTM cells. It can obtain results of 91.92% for the test accuracy, compared to the previously attained 91.65% using vanilla LSTM cells. Note that this is not to compare to other research, where up to 93.35% is obtained, but costly using 18 LSTM cells rather than with 2 to 3 cells as analyzed here. Finally, an interesting discovery is made, such that adding activation within the multi-head attention mechanism’s linear layers can yield better results in the context researched hereto. |
Tasks | Time Series |
Published | 2018-08-16 |
URL | http://arxiv.org/abs/1808.05578v1 |
http://arxiv.org/pdf/1808.05578v1.pdf | |
PWC | https://paperswithcode.com/paper/larnn-linear-attention-recurrent-neural |
Repo | https://github.com/guillaume-chevalier/Linear-Attention-Recurrent-Neural-Network |
Framework | pytorch |
Unsupervised Typography Transfer
Title | Unsupervised Typography Transfer |
Authors | Hanfei Sun, Yiming Luo, Ziang Lu |
Abstract | Traditional methods in Chinese typography synthesis view characters as an assembly of radicals and strokes, but they rely on manual definition of the key points, which is still time-costing. Some recent work on computer vision proposes a brand new approach: to treat every Chinese character as an independent and inseparable image, so the pre-processing and post-processing of each character can be avoided. Then with a combination of a transfer network and a discriminating network, one typography can be well transferred to another. Despite the quite satisfying performance of the model, the training process requires to be supervised, which means in the training data each character in the source domain and the target domain needs to be perfectly paired. Sometimes the pairing is time-costing, and sometimes there is no perfect pairing, such as the pairing between traditional Chinese and simplified Chinese characters. In this paper, we proposed an unsupervised typography transfer method which doesn’t need pairing. |
Tasks | |
Published | 2018-02-07 |
URL | http://arxiv.org/abs/1802.02595v1 |
http://arxiv.org/pdf/1802.02595v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-typography-transfer |
Repo | https://github.com/hanfeisun/Unsupervised-Typography-Transfer |
Framework | tf |
Learning to Measure Change: Fully Convolutional Siamese Metric Networks for Scene Change Detection
Title | Learning to Measure Change: Fully Convolutional Siamese Metric Networks for Scene Change Detection |
Authors | Enqiang Guo, Xinsha Fu, Jiawei Zhu, Min Deng, Yu Liu, Qing Zhu, Haifeng Li |
Abstract | A critical challenge problem of scene change detection is that noisy changes generated by varying illumination, shadows and camera viewpoint make variances of a scene difficult to define and measure since the noisy changes and semantic ones are entangled. Following the intuitive idea of detecting changes by directly comparing dissimilarities between a pair of features, we propose a novel fully Convolutional siamese metric Network(CosimNet) to measure changes by customizing implicit metrics. To learn more discriminative metrics, we utilize contrastive loss to reduce the distance between the unchanged feature pairs and to enlarge the distance between the changed feature pairs. Specifically, to address the issue of large viewpoint differences, we propose Thresholded Contrastive Loss (TCL) with a more tolerant strategy to punish noisy changes. We demonstrate the effectiveness of the proposed approach with experiments on three challenging datasets: CDnet, PCD2015, and VL-CMU-CD. Our approach is robust to lots of challenging conditions, such as illumination changes, large viewpoint difference caused by camera motion and zooming. In addition, we incorporate the distance metric into the segmentation framework and validate the effectiveness through visualization of change maps and feature distribution. The source code is available at https://github.com/gmayday1997/ChangeDet. |
Tasks | |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09111v3 |
http://arxiv.org/pdf/1810.09111v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-measure-change-fully |
Repo | https://github.com/gmayday1997/ChangeDet |
Framework | pytorch |
NEWMA: a new method for scalable model-free online change-point detection
Title | NEWMA: a new method for scalable model-free online change-point detection |
Authors | Nicolas Keriven, Damien Garreau, Iacopo Poli |
Abstract | We consider the problem of detecting abrupt changes in the distribution of a multi-dimensional time series, with limited computing power and memory. In this paper, we propose a new, simple method for model-free online change-point detection that relies only on fast and light recursive statistics, inspired by the classical Exponential Weighted Moving Average algorithm (EWMA). The proposed idea is to compute two EWMA statistics on the stream of data with different forgetting factors, and to compare them. By doing so, we show that we implicitly compare recent samples with older ones, without the need to explicitly store them. Additionally, we leverage Random Features (RFs) to efficiently use the Maximum Mean Discrepancy as a distance between distributions, furthermore exploiting recent optical hardware to compute high-dimensional RFs in near constant time. We show that our method is significantly faster than usual non-parametric methods for a given accuracy. |
Tasks | Change Point Detection, Time Series |
Published | 2018-05-21 |
URL | https://arxiv.org/abs/1805.08061v4 |
https://arxiv.org/pdf/1805.08061v4.pdf | |
PWC | https://paperswithcode.com/paper/newma-a-new-method-for-scalable-model-free |
Repo | https://github.com/lightonai/newma |
Framework | none |
Caveats for information bottleneck in deterministic scenarios
Title | Caveats for information bottleneck in deterministic scenarios |
Authors | Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk |
Abstract | Information bottleneck (IB) is a method for extracting information from one random variable $X$ that is relevant for predicting another random variable $Y$. To do so, IB identifies an intermediate “bottleneck” variable $T$ that has low mutual information $I(X;T)$ and high mutual information $I(Y;T)$. The “IB curve” characterizes the set of bottleneck variables that achieve maximal $I(Y;T)$ for a given $I(X;T)$, and is typically explored by maximizing the “IB Lagrangian”, $I(Y;T) - \beta I(X;T)$. In some cases, $Y$ is a deterministic function of $X$, including many classification problems in supervised learning where the output class $Y$ is a deterministic function of the input $X$. We demonstrate three caveats when using IB in any situation where $Y$ is a deterministic function of $X$: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of $\beta$; (2) there are “uninteresting” trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when $Y$ is a small perturbation away from being a deterministic function of $X$, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset. |
Tasks | |
Published | 2018-08-23 |
URL | http://arxiv.org/abs/1808.07593v4 |
http://arxiv.org/pdf/1808.07593v4.pdf | |
PWC | https://paperswithcode.com/paper/caveats-for-information-bottleneck-in |
Repo | https://github.com/artemyk/ibcurve |
Framework | tf |
LoGAN: Generating Logos with a Generative Adversarial Neural Network Conditioned on color
Title | LoGAN: Generating Logos with a Generative Adversarial Neural Network Conditioned on color |
Authors | Ajkel Mino, Gerasimos Spanakis |
Abstract | Designing a logo is a long, complicated, and expensive process for any designer. However, recent advancements in generative algorithms provide models that could offer a possible solution. Logos are multi-modal, have very few categorical properties, and do not have a continuous latent space. Yet, conditional generative adversarial networks can be used to generate logos that could help designers in their creative process. We propose LoGAN: an improved auxiliary classifier Wasserstein generative adversarial neural network (with gradient penalty) that is able to generate logos conditioned on twelve different colors. In 768 generated instances (12 classes and 64 logos per class), when looking at the most prominent color, the conditional generation part of the model has an overall precision and recall of 0.8 and 0.7 respectively. LoGAN’s results offer a first glance at how artificial intelligence can be used to assist designers in their creative process and open promising future directions, such as including more descriptive labels which will provide a more exhaustive and easy-to-use system. |
Tasks | |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.10395v1 |
http://arxiv.org/pdf/1810.10395v1.pdf | |
PWC | https://paperswithcode.com/paper/logan-generating-logos-with-a-generative |
Repo | https://github.com/ajki/LoGAN |
Framework | tf |
Weakly Supervised Domain-Specific Color Naming Based on Attention
Title | Weakly Supervised Domain-Specific Color Naming Based on Attention |
Authors | Lu Yu, Yongmei Cheng, Joost van de Weijer |
Abstract | The majority of existing color naming methods focuses on the eleven basic color terms of the English language. However, in many applications, different sets of color names are used for the accurate description of objects. Labeling data to learn these domain-specific color names is an expensive and laborious task. Therefore, in this article we aim to learn color names from weakly labeled data. For this purpose, we add an attention branch to the color naming network. The attention branch is used to modulate the pixel-wise color naming predictions of the network. In experiments, we illustrate that the attention branch correctly identifies the relevant regions. Furthermore, we show that our method obtains state-of-the-art results for pixel-wise and image-wise classification on the EBAY dataset and is able to learn color names for various domains. |
Tasks | |
Published | 2018-05-11 |
URL | http://arxiv.org/abs/1805.04385v1 |
http://arxiv.org/pdf/1805.04385v1.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-domain-specific-color |
Repo | https://github.com/yulu0724/AttentionColorName |
Framework | none |
Invisible Steganography via Generative Adversarial Networks
Title | Invisible Steganography via Generative Adversarial Networks |
Authors | Ru Zhang, Shiqi Dong, Jianyi Liu |
Abstract | Nowadays, there are plenty of works introducing convolutional neural networks (CNNs) to the steganalysis and exceeding conventional steganalysis algorithms. These works have shown the improving potential of deep learning in information hiding domain. There are also several works based on deep learning to do image steganography, but these works still have problems in capacity, invisibility and security. In this paper, we propose a novel CNN architecture named as \isgan to conceal a secret gray image into a color cover image on the sender side and exactly extract the secret image out on the receiver side. There are three contributions in our work: (i) we improve the invisibility by hiding the secret image only in the Y channel of the cover image; (ii) We introduce the generative adversarial networks to strengthen the security by minimizing the divergence between the empirical probability distributions of stego images and natural images. (iii) In order to associate with the human visual system better, we construct a mixed loss function which is more appropriate for steganography to generate more realistic stego images and reveal out more better secret images. Experiment results show that ISGAN can achieve start-of-art performances on LFW, Pascal VOC2012 and ImageNet datasets. |
Tasks | Image Steganography |
Published | 2018-07-23 |
URL | http://arxiv.org/abs/1807.08571v3 |
http://arxiv.org/pdf/1807.08571v3.pdf | |
PWC | https://paperswithcode.com/paper/invisible-steganography-via-generative |
Repo | https://github.com/Neykah/isgan |
Framework | tf |
Unrestricted Adversarial Examples
Title | Unrestricted Adversarial Examples |
Authors | Tom B. Brown, Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul Christiano, Ian Goodfellow |
Abstract | We introduce a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool. Unlike most prior work in ML robustness, which studies norm-constrained adversaries, we shift our focus to unconstrained adversaries. Defenders submit machine learning models, and try to achieve high accuracy and coverage on non-adversarial data while making no confident mistakes on adversarial inputs. Attackers try to subvert defenses by finding arbitrary unambiguous inputs where the model assigns an incorrect label with high confidence. We propose a simple unambiguous dataset (“bird-or- bicycle”) to use as part of this contest. We hope this contest will help to more comprehensively evaluate the worst-case adversarial risk of machine learning models. |
Tasks | |
Published | 2018-09-22 |
URL | http://arxiv.org/abs/1809.08352v1 |
http://arxiv.org/pdf/1809.08352v1.pdf | |
PWC | https://paperswithcode.com/paper/unrestricted-adversarial-examples |
Repo | https://github.com/google/unrestricted-adversarial-examples |
Framework | tf |
Segmentation-free Compositional $n$-gram Embedding
Title | Segmentation-free Compositional $n$-gram Embedding |
Authors | Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira |
Abstract | We propose a new type of representation learning method that models words, phrases and sentences seamlessly. Our method does not depend on word segmentation and any human-annotated resources (e.g., word dictionaries), yet it is very effective for noisy corpora written in unsegmented languages such as Chinese and Japanese. The main idea of our method is to ignore word boundaries completely (i.e., segmentation-free), and construct representations for all character $n$-grams in a raw corpus with embeddings of compositional sub-$n$-grams. Although the idea is simple, our experiments on various benchmarks and real-world datasets show the efficacy of our proposal. |
Tasks | Representation Learning, Word Embeddings |
Published | 2018-09-04 |
URL | https://arxiv.org/abs/1809.00918v2 |
https://arxiv.org/pdf/1809.00918v2.pdf | |
PWC | https://paperswithcode.com/paper/segmentation-free-compositional-n-gram |
Repo | https://github.com/kdrl/SCNE |
Framework | none |