October 21, 2019

3232 words 16 mins read

Paper Group AWR 120

Measuring LDA Topic Stability from Clusters of Replicated Runs. Multilingual Extractive Reading Comprehension by Runtime Machine Translation. Reasoning about Actions and State Changes by Injecting Commonsense Knowledge. What Makes Reading Comprehension Questions Easier?. GuideR: a guided separate-and-conquer rule learning in classification, regress …

Measuring LDA Topic Stability from Clusters of Replicated Runs


Title	Measuring LDA Topic Stability from Clusters of Replicated Runs
Authors	Mika Mäntylä, Maëlick Claes, Umar Farooq
Abstract	Background: Unstructured and textual data is increasing rapidly and Latent Dirichlet Allocation (LDA) topic modeling is a popular data analysis methods for it. Past work suggests that instability of LDA topics may lead to systematic errors. Aim: We propose a method that relies on replicated LDA runs, clustering, and providing a stability metric for the topics. Method: We generate k LDA topics and replicate this process n times resulting in nk topics. Then we use K-medioids to cluster the nk topics to k clusters. The k clusters now represent the original LDA topics and we present them like normal LDA topics showing the ten most probable words. For the clusters, we try multiple stability metrics, out of which we recommend Rank-Biased Overlap, showing the stability of the topics inside the clusters. Results: We provide an initial validation where our method is used for 270,000 Mozilla Firefox commit messages with k=20 and n=20. We show how our topic stability metrics are related to the contents of the topics. Conclusions: Advances in text mining enable us to analyze large masses of text in software engineering but non-deterministic algorithms, such as LDA, may lead to unreplicable conclusions. Our approach makes LDA stability transparent and is also complementary rather than alternative to many prior works that focus on LDA parameter tuning.
Tasks
Published	2018-08-24
URL	http://arxiv.org/abs/1808.08098v1
PDF	http://arxiv.org/pdf/1808.08098v1.pdf
PWC	https://paperswithcode.com/paper/measuring-lda-topic-stability-from-clusters
Repo	https://github.com/M3SOulu/Measuring-LDA-Topic-Stability
Framework	none

Multilingual Extractive Reading Comprehension by Runtime Machine Translation


Title	Multilingual Extractive Reading Comprehension by Runtime Machine Translation
Authors	Akari Asai, Akiko Eriguchi, Kazuma Hashimoto, Yoshimasa Tsuruoka
Abstract	Despite recent work in Reading Comprehension (RC), progress has been mostly limited to English due to the lack of large-scale datasets in other languages. In this work, we introduce the first RC system for languages without RC training data. Given a target language without RC training data and a pivot language with RC training data (e.g. English), our method leverages existing RC resources in the pivot language by combining a competitive RC model in the pivot language with an attentive Neural Machine Translation (NMT) model. We first translate the data from the target to the pivot language, and then obtain an answer using the RC model in the pivot language. Finally, we recover the corresponding answer in the original language using soft-alignment attention scores from the NMT model. We create evaluation sets of RC data in two non-English languages, namely Japanese and French, to evaluate our method. Experimental results on these datasets show that our method significantly outperforms a back-translation baseline of a state-of-the-art product-level machine translation system.
Tasks	Machine Translation, Reading Comprehension
Published	2018-09-10
URL	http://arxiv.org/abs/1809.03275v2
PDF	http://arxiv.org/pdf/1809.03275v2.pdf
PWC	https://paperswithcode.com/paper/multilingual-extractive-reading-comprehension
Repo	https://github.com/AkariAsai/extractive_rc_by_runtime_mt
Framework	pytorch

Reasoning about Actions and State Changes by Injecting Commonsense Knowledge


Title	Reasoning about Actions and State Changes by Injecting Commonsense Knowledge
Authors	Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark
Abstract	Comprehending procedural text, e.g., a paragraph describing photosynthesis, requires modeling actions and the state changes they produce, so that questions about entities at different timepoints can be answered. Although several recent systems have shown impressive progress in this task, their predictions can be globally inconsistent or highly improbable. In this paper, we show how the predicted effects of actions in the context of a paragraph can be improved in two ways: (1) by incorporating global, commonsense constraints (e.g., a non-existent entity cannot be destroyed), and (2) by biasing reading with preferences from large-scale corpora (e.g., trees rarely move). Unlike earlier methods, we treat the problem as a neural structured prediction task, allowing hard and soft constraints to steer the model away from unlikely predictions. We show that the new model significantly outperforms earlier systems on a benchmark dataset for procedural text comprehension (+8% relative gain), and that it also avoids some of the nonsensical predictions that earlier systems make.
Tasks	Reading Comprehension, Structured Prediction
Published	2018-08-29
URL	http://arxiv.org/abs/1808.10012v1
PDF	http://arxiv.org/pdf/1808.10012v1.pdf
PWC	https://paperswithcode.com/paper/reasoning-about-actions-and-state-changes-by
Repo	https://github.com/allenai/propara
Framework	pytorch

What Makes Reading Comprehension Questions Easier?


Title	What Makes Reading Comprehension Questions Easier?
Authors	Saku Sugawara, Kentaro Inui, Satoshi Sekine, Akiko Aizawa
Abstract	A challenge in creating a dataset for machine reading comprehension (MRC) is to collect questions that require a sophisticated understanding of language to answer beyond using superficial cues. In this work, we investigate what makes questions easier across recent 12 MRC datasets with three question styles (answer extraction, description, and multiple choice). We propose to employ simple heuristics to split each dataset into easy and hard subsets and examine the performance of two baseline models for each of the subsets. We then manually annotate questions sampled from each subset with both validity and requisite reasoning skills to investigate which skills explain the difference between easy and hard questions. From this study, we observed that (i) the baseline performances for the hard subsets remarkably degrade compared to those of entire datasets, (ii) hard questions require knowledge inference and multiple-sentence reasoning in comparison with easy questions, and (iii) multiple-choice questions tend to require a broader range of reasoning skills than answer extraction and description questions. These results suggest that one might overestimate recent advances in MRC.
Tasks	Machine Reading Comprehension, Reading Comprehension
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09384v1
PDF	http://arxiv.org/pdf/1808.09384v1.pdf
PWC	https://paperswithcode.com/paper/what-makes-reading-comprehension-questions
Repo	https://github.com/Alab-NII/mrc-heuristics
Framework	none

GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings


Title	GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings
Authors	Marek Sikora, Łukasz Wróbel, Adam Gudyś
Abstract	This article presents GuideR, a user-guided rule induction algorithm, which overcomes the largest limitation of the existing methods-the lack of the possibility to introduce user’s preferences or domain knowledge to the rule learning process. Automatic selection of attributes and attribute ranges often leads to the situation in which resulting rules do not contain interesting information. We propose an induction algorithm which takes into account user’s requirements. Our method uses the sequential covering approach and is suitable for classification, regression, and survival analysis problems. The effectiveness of the algorithm in all these tasks has been verified experimentally, confirming guided rule induction to be a powerful data analysis tool.
Tasks	Survival Analysis
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01579v1
PDF	http://arxiv.org/pdf/1806.01579v1.pdf
PWC	https://paperswithcode.com/paper/guider-a-guided-separate-and-conquer-rule
Repo	https://github.com/adaa-polsl/GuideR
Framework	none

LARNN: Linear Attention Recurrent Neural Network


Title	LARNN: Linear Attention Recurrent Neural Network
Authors	Guillaume Chevalier
Abstract	The Linear Attention Recurrent Neural Network (LARNN) is a recurrent attention module derived from the Long Short-Term Memory (LSTM) cell and ideas from the consciousness Recurrent Neural Network (RNN). Yes, it LARNNs. The LARNN uses attention on its past cell state values for a limited window size $k$. The formulas are also derived from the Batch Normalized LSTM (BN-LSTM) cell and the Transformer Network for its Multi-Head Attention Mechanism. The Multi-Head Attention Mechanism is used inside the cell such that it can query its own $k$ past values with the attention window. This has the effect of augmenting the rank of the tensor with the attention mechanism, such that the cell can perform complex queries to question its previous inner memories, which should augment the long short-term effect of the memory. With a clever trick, the LARNN cell with attention can be easily used inside a loop on the cell state, just like how any other Recurrent Neural Network (RNN) cell can be looped linearly through time series. This is due to the fact that its state, which is looped upon throughout time steps within time series, stores the inner states in a “first in, first out” queue which contains the $k$ most recent states and on which it is easily possible to add static positional encoding when the queue is represented as a tensor. This neural architecture yields better results than the vanilla LSTM cells. It can obtain results of 91.92% for the test accuracy, compared to the previously attained 91.65% using vanilla LSTM cells. Note that this is not to compare to other research, where up to 93.35% is obtained, but costly using 18 LSTM cells rather than with 2 to 3 cells as analyzed here. Finally, an interesting discovery is made, such that adding activation within the multi-head attention mechanism’s linear layers can yield better results in the context researched hereto.
Tasks	Time Series
Published	2018-08-16
URL	http://arxiv.org/abs/1808.05578v1
PDF	http://arxiv.org/pdf/1808.05578v1.pdf
PWC	https://paperswithcode.com/paper/larnn-linear-attention-recurrent-neural
Repo	https://github.com/guillaume-chevalier/Linear-Attention-Recurrent-Neural-Network
Framework	pytorch

Unsupervised Typography Transfer


Title	Unsupervised Typography Transfer
Authors	Hanfei Sun, Yiming Luo, Ziang Lu
Abstract	Traditional methods in Chinese typography synthesis view characters as an assembly of radicals and strokes, but they rely on manual definition of the key points, which is still time-costing. Some recent work on computer vision proposes a brand new approach: to treat every Chinese character as an independent and inseparable image, so the pre-processing and post-processing of each character can be avoided. Then with a combination of a transfer network and a discriminating network, one typography can be well transferred to another. Despite the quite satisfying performance of the model, the training process requires to be supervised, which means in the training data each character in the source domain and the target domain needs to be perfectly paired. Sometimes the pairing is time-costing, and sometimes there is no perfect pairing, such as the pairing between traditional Chinese and simplified Chinese characters. In this paper, we proposed an unsupervised typography transfer method which doesn’t need pairing.
Tasks
Published	2018-02-07
URL	http://arxiv.org/abs/1802.02595v1
PDF	http://arxiv.org/pdf/1802.02595v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-typography-transfer
Repo	https://github.com/hanfeisun/Unsupervised-Typography-Transfer
Framework	tf

Learning to Measure Change: Fully Convolutional Siamese Metric Networks for Scene Change Detection


Title	Learning to Measure Change: Fully Convolutional Siamese Metric Networks for Scene Change Detection
Authors	Enqiang Guo, Xinsha Fu, Jiawei Zhu, Min Deng, Yu Liu, Qing Zhu, Haifeng Li
Abstract	A critical challenge problem of scene change detection is that noisy changes generated by varying illumination, shadows and camera viewpoint make variances of a scene difficult to define and measure since the noisy changes and semantic ones are entangled. Following the intuitive idea of detecting changes by directly comparing dissimilarities between a pair of features, we propose a novel fully Convolutional siamese metric Network(CosimNet) to measure changes by customizing implicit metrics. To learn more discriminative metrics, we utilize contrastive loss to reduce the distance between the unchanged feature pairs and to enlarge the distance between the changed feature pairs. Specifically, to address the issue of large viewpoint differences, we propose Thresholded Contrastive Loss (TCL) with a more tolerant strategy to punish noisy changes. We demonstrate the effectiveness of the proposed approach with experiments on three challenging datasets: CDnet, PCD2015, and VL-CMU-CD. Our approach is robust to lots of challenging conditions, such as illumination changes, large viewpoint difference caused by camera motion and zooming. In addition, we incorporate the distance metric into the segmentation framework and validate the effectiveness through visualization of change maps and feature distribution. The source code is available at https://github.com/gmayday1997/ChangeDet.
Tasks
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09111v3
PDF	http://arxiv.org/pdf/1810.09111v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-measure-change-fully
Repo	https://github.com/gmayday1997/ChangeDet
Framework	pytorch

NEWMA: a new method for scalable model-free online change-point detection


Title	NEWMA: a new method for scalable model-free online change-point detection
Authors	Nicolas Keriven, Damien Garreau, Iacopo Poli
Abstract	We consider the problem of detecting abrupt changes in the distribution of a multi-dimensional time series, with limited computing power and memory. In this paper, we propose a new, simple method for model-free online change-point detection that relies only on fast and light recursive statistics, inspired by the classical Exponential Weighted Moving Average algorithm (EWMA). The proposed idea is to compute two EWMA statistics on the stream of data with different forgetting factors, and to compare them. By doing so, we show that we implicitly compare recent samples with older ones, without the need to explicitly store them. Additionally, we leverage Random Features (RFs) to efficiently use the Maximum Mean Discrepancy as a distance between distributions, furthermore exploiting recent optical hardware to compute high-dimensional RFs in near constant time. We show that our method is significantly faster than usual non-parametric methods for a given accuracy.
Tasks	Change Point Detection, Time Series
Published	2018-05-21
URL	https://arxiv.org/abs/1805.08061v4
PDF	https://arxiv.org/pdf/1805.08061v4.pdf
PWC	https://paperswithcode.com/paper/newma-a-new-method-for-scalable-model-free
Repo	https://github.com/lightonai/newma
Framework	none

Caveats for information bottleneck in deterministic scenarios


Title	Caveats for information bottleneck in deterministic scenarios
Authors	Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk
Abstract	Information bottleneck (IB) is a method for extracting information from one random variable $X$ that is relevant for predicting another random variable $Y$. To do so, IB identifies an intermediate “bottleneck” variable $T$ that has low mutual information $I(X;T)$ and high mutual information $I(Y;T)$. The “IB curve” characterizes the set of bottleneck variables that achieve maximal $I(Y;T)$ for a given $I(X;T)$, and is typically explored by maximizing the “IB Lagrangian”, $I(Y;T) - \beta I(X;T)$. In some cases, $Y$ is a deterministic function of $X$, including many classification problems in supervised learning where the output class $Y$ is a deterministic function of the input $X$. We demonstrate three caveats when using IB in any situation where $Y$ is a deterministic function of $X$: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of $\beta$; (2) there are “uninteresting” trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when $Y$ is a small perturbation away from being a deterministic function of $X$, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset.
Tasks
Published	2018-08-23
URL	http://arxiv.org/abs/1808.07593v4
PDF	http://arxiv.org/pdf/1808.07593v4.pdf
PWC	https://paperswithcode.com/paper/caveats-for-information-bottleneck-in
Repo	https://github.com/artemyk/ibcurve
Framework	tf

LoGAN: Generating Logos with a Generative Adversarial Neural Network Conditioned on color


Title	LoGAN: Generating Logos with a Generative Adversarial Neural Network Conditioned on color
Authors	Ajkel Mino, Gerasimos Spanakis
Abstract	Designing a logo is a long, complicated, and expensive process for any designer. However, recent advancements in generative algorithms provide models that could offer a possible solution. Logos are multi-modal, have very few categorical properties, and do not have a continuous latent space. Yet, conditional generative adversarial networks can be used to generate logos that could help designers in their creative process. We propose LoGAN: an improved auxiliary classifier Wasserstein generative adversarial neural network (with gradient penalty) that is able to generate logos conditioned on twelve different colors. In 768 generated instances (12 classes and 64 logos per class), when looking at the most prominent color, the conditional generation part of the model has an overall precision and recall of 0.8 and 0.7 respectively. LoGAN’s results offer a first glance at how artificial intelligence can be used to assist designers in their creative process and open promising future directions, such as including more descriptive labels which will provide a more exhaustive and easy-to-use system.
Tasks
Published	2018-10-23
URL	http://arxiv.org/abs/1810.10395v1
PDF	http://arxiv.org/pdf/1810.10395v1.pdf
PWC	https://paperswithcode.com/paper/logan-generating-logos-with-a-generative
Repo	https://github.com/ajki/LoGAN
Framework	tf

Weakly Supervised Domain-Specific Color Naming Based on Attention


Title	Weakly Supervised Domain-Specific Color Naming Based on Attention
Authors	Lu Yu, Yongmei Cheng, Joost van de Weijer
Abstract	The majority of existing color naming methods focuses on the eleven basic color terms of the English language. However, in many applications, different sets of color names are used for the accurate description of objects. Labeling data to learn these domain-specific color names is an expensive and laborious task. Therefore, in this article we aim to learn color names from weakly labeled data. For this purpose, we add an attention branch to the color naming network. The attention branch is used to modulate the pixel-wise color naming predictions of the network. In experiments, we illustrate that the attention branch correctly identifies the relevant regions. Furthermore, we show that our method obtains state-of-the-art results for pixel-wise and image-wise classification on the EBAY dataset and is able to learn color names for various domains.
Tasks
Published	2018-05-11
URL	http://arxiv.org/abs/1805.04385v1
PDF	http://arxiv.org/pdf/1805.04385v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-domain-specific-color
Repo	https://github.com/yulu0724/AttentionColorName
Framework	none

Invisible Steganography via Generative Adversarial Networks


Title	Invisible Steganography via Generative Adversarial Networks
Authors	Ru Zhang, Shiqi Dong, Jianyi Liu
Abstract	Nowadays, there are plenty of works introducing convolutional neural networks (CNNs) to the steganalysis and exceeding conventional steganalysis algorithms. These works have shown the improving potential of deep learning in information hiding domain. There are also several works based on deep learning to do image steganography, but these works still have problems in capacity, invisibility and security. In this paper, we propose a novel CNN architecture named as \isgan to conceal a secret gray image into a color cover image on the sender side and exactly extract the secret image out on the receiver side. There are three contributions in our work: (i) we improve the invisibility by hiding the secret image only in the Y channel of the cover image; (ii) We introduce the generative adversarial networks to strengthen the security by minimizing the divergence between the empirical probability distributions of stego images and natural images. (iii) In order to associate with the human visual system better, we construct a mixed loss function which is more appropriate for steganography to generate more realistic stego images and reveal out more better secret images. Experiment results show that ISGAN can achieve start-of-art performances on LFW, Pascal VOC2012 and ImageNet datasets.
Tasks	Image Steganography
Published	2018-07-23
URL	http://arxiv.org/abs/1807.08571v3
PDF	http://arxiv.org/pdf/1807.08571v3.pdf
PWC	https://paperswithcode.com/paper/invisible-steganography-via-generative
Repo	https://github.com/Neykah/isgan
Framework	tf

Unrestricted Adversarial Examples


Title	Unrestricted Adversarial Examples
Authors	Tom B. Brown, Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul Christiano, Ian Goodfellow
Abstract	We introduce a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool. Unlike most prior work in ML robustness, which studies norm-constrained adversaries, we shift our focus to unconstrained adversaries. Defenders submit machine learning models, and try to achieve high accuracy and coverage on non-adversarial data while making no confident mistakes on adversarial inputs. Attackers try to subvert defenses by finding arbitrary unambiguous inputs where the model assigns an incorrect label with high confidence. We propose a simple unambiguous dataset (“bird-or- bicycle”) to use as part of this contest. We hope this contest will help to more comprehensively evaluate the worst-case adversarial risk of machine learning models.
Tasks
Published	2018-09-22
URL	http://arxiv.org/abs/1809.08352v1
PDF	http://arxiv.org/pdf/1809.08352v1.pdf
PWC	https://paperswithcode.com/paper/unrestricted-adversarial-examples
Repo	https://github.com/google/unrestricted-adversarial-examples
Framework	tf

Segmentation-free Compositional $n$-gram Embedding


Title	Segmentation-free Compositional $n$-gram Embedding
Authors	Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira
Abstract	We propose a new type of representation learning method that models words, phrases and sentences seamlessly. Our method does not depend on word segmentation and any human-annotated resources (e.g., word dictionaries), yet it is very effective for noisy corpora written in unsegmented languages such as Chinese and Japanese. The main idea of our method is to ignore word boundaries completely (i.e., segmentation-free), and construct representations for all character $n$-grams in a raw corpus with embeddings of compositional sub-$n$-grams. Although the idea is simple, our experiments on various benchmarks and real-world datasets show the efficacy of our proposal.
Tasks	Representation Learning, Word Embeddings
Published	2018-09-04
URL	https://arxiv.org/abs/1809.00918v2
PDF	https://arxiv.org/pdf/1809.00918v2.pdf
PWC	https://paperswithcode.com/paper/segmentation-free-compositional-n-gram
Repo	https://github.com/kdrl/SCNE
Framework	none