January 26, 2020

3372 words 16 mins read

Paper Group ANR 1572

The Missing Data Encoder: Cross-Channel Image Completion\with Hide-And-Seek Adversarial Network. Scheduling optimization of parallel linear algebra algorithms using Supervised Learning. Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection. Automatic Model Monitoring for Data …

The Missing Data Encoder: Cross-Channel Image Completion\with Hide-And-Seek Adversarial Network


Title	The Missing Data Encoder: Cross-Channel Image Completion\with Hide-And-Seek Adversarial Network
Authors	Arnaud Dapogny, Matthieu Cord, Patrick Perez
Abstract	Image completion is the problem of generating whole images from fragments only. It encompasses inpainting (generating a patch given its surrounding), reverse inpainting/extrapolation (generating the periphery given the central patch) as well as colorization (generating one or several channels given other ones). In this paper, we employ a deep network to perform image completion, with adversarial training as well as perceptual and completion losses, and call it the `missing data encoder'' (MDE). We consider several configurations based on how the seed fragments are chosen. We show that training MDE for` random extrapolation and colorization’’ (MDE-REC), i.e. using random channel-independent fragments, allows a better capture of the image semantics and geometry. MDE training makes use of a novel ``hide-and-seek’’ adversarial loss, where the discriminator seeks the original non-masked regions, while the generator tries to hide them. We validate our models both qualitatively and quantitatively on several datasets, showing their interest for image completion, unsupervised representation learning as well as face occlusion handling. \|
Tasks	Colorization, Representation Learning, Unsupervised Representation Learning
Published	2019-05-06
URL	https://arxiv.org/abs/1905.01861v1
PDF	https://arxiv.org/pdf/1905.01861v1.pdf
PWC	https://paperswithcode.com/paper/the-missing-data-encoder-cross-channel-image
Repo
Framework

Scheduling optimization of parallel linear algebra algorithms using Supervised Learning


Title	Scheduling optimization of parallel linear algebra algorithms using Supervised Learning
Authors	G. Laberge, S. Shirzad, P. Diehl, H. Kaiser, S. Prudhomme, A. Lemoine
Abstract	Linear algebra algorithms are used widely in a variety of domains, e.g machine learning, numerical physics and video games graphics. For all these applications, loop-level parallelism is required to achieve high performance. However, finding the optimal way to schedule the workload between threads is a non-trivial problem because it depends on the structure of the algorithm being parallelized and the hardware the executable is run on. In the realm of Asynchronous Many Task runtime systems, a key aspect of the scheduling problem is predicting the proper chunk-size, where the chunk-size is defined as the number of iterations of a for-loop assigned to a thread as one task. In this paper, we study the applications of supervised learning models to predict the chunk-size which yields maximum performance on multiple parallel linear algebra operations using the HPX backend of Blaze’s linear algebra library. More precisely, we generate our training and tests sets by measuring performance of the application with different chunk-sizes for multiple linear algebra operations; vector-addition, matrix-vector-multiplication, matrix-matrix addition and matrix-matrix-multiplication. We compare the use of logistic regression, neural networks and decision trees with a newly developed decision tree based model in order to predict the optimal value for chunk-size. Our results show that classical decision trees and our custom decision tree model are able to forecast a chunk-size which results in good performance for the linear algebra operations.
Tasks
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03947v2
PDF	https://arxiv.org/pdf/1909.03947v2.pdf
PWC	https://paperswithcode.com/paper/scheduling-optimization-of-parallel-linear
Repo
Framework

Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection


Title	Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection
Authors	David Ifeoluwa Adelani, Haotian Mai, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Abstract	Advanced neural language models (NLMs) are widely used in sequence generation tasks because they are able to produce fluent and meaningful sentences. They can also be used to generate fake reviews, which can then be used to attack online review systems and influence the buying decisions of online shoppers. To perform such attacks, it is necessary for experts to train a tailored LM for a specific topic. In this work, we show that a low-skilled threat model can be built just by combining publicly available LMs and show that the produced fake reviews can fool both humans and machines. In particular, we use the GPT-2 NLM to generate a large number of high-quality reviews based on a review with the desired sentiment and then using a BERT based text classifier (with accuracy of 96%) to filter out reviews with undesired sentiments. Because none of the words in the review are modified, fluent samples like the training data can be generated from the learned distribution. A subjective evaluation with 80 participants demonstrated that this simple method can produce reviews that are as fluent as those written by people. It also showed that the participants tended to distinguish fake reviews randomly. Three countermeasures, Grover, GLTR, and OpenAI GPT-2 detector, were found to be difficult to accurately detect fake review.
Tasks
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09177v2
PDF	https://arxiv.org/pdf/1907.09177v2.pdf
PWC	https://paperswithcode.com/paper/generating-sentiment-preserving-fake-online
Repo
Framework

Automatic Model Monitoring for Data Streams


Title	Automatic Model Monitoring for Data Streams
Authors	Fábio Pinto, Marco O. P. Sampaio, Pedro Bizarro
Abstract	Detecting concept drift is a well known problem that affects production systems. However, two important issues that are frequently not addressed in the literature are 1) the detection of drift when the labels are not immediately available; and 2) the automatic generation of explanations to identify possible causes for the drift. For example, a fraud detection model in online payments could show a drift due to a hot sale item (with an increase in false positives) or due to a true fraud attack (with an increase in false negatives) before labels are available. In this paper we propose SAMM, an automatic model monitoring system for data streams. SAMM detects concept drift using a time and space efficient unsupervised streaming algorithm and it generates alarm reports with a summary of the events and features that are important to explain it. SAMM was evaluated in five real world fraud detection datasets, each spanning periods up to eight months and totaling more than 22 million online transactions. We evaluated SAMM using human feedback from domain experts, by sending them 100 reports generated by the system. Our results show that SAMM is able to detect anomalous events in a model life cycle that are considered useful by the domain experts. Given these results, SAMM will be rolled out in a next version of Feedzai’s Fraud Detection solution.
Tasks	Fraud Detection
Published	2019-08-12
URL	https://arxiv.org/abs/1908.04240v1
PDF	https://arxiv.org/pdf/1908.04240v1.pdf
PWC	https://paperswithcode.com/paper/automatic-model-monitoring-for-data-streams
Repo
Framework

Solving Forward and Inverse Problems Using Autoencoders


Title	Solving Forward and Inverse Problems Using Autoencoders
Authors	Hwan Goh, Sheroze Sheriffdeen, Tan Bui-Thanh
Abstract	This work develops a model-aware autoencoder networks as a new method for solving scientific forward and inverse problems. Autoencoders are unsupervised neural networks that are able to learn new representations of data through appropriately selected architecture and regularization. The resulting mappings to and from the latent representation can be used to encode and decode the data. In our work, we set the data space to be the parameter space of a parameter of interest we wish to invert for. Further, as a way to encode the underlying physical model into the autoencoder, we enforce the latent space of an autoencoder to be the space of observations of physically-governed phenomena. In doing so, we leverage the well known capability of a deep neural network as a universal function operator to simultaneously obtain both the parameter-to-observation and observation-to-parameter map. The results suggest that this simultaneous learning interacts synergistically to improve the the inversion capability of the autoencoder.
Tasks
Published	2019-12-05
URL	https://arxiv.org/abs/1912.04212v3
PDF	https://arxiv.org/pdf/1912.04212v3.pdf
PWC	https://paperswithcode.com/paper/solving-forward-and-inverse-problems-using
Repo
Framework

Scalable Probabilistic Matrix Factorization with Graph-Based Priors


Title	Scalable Probabilistic Matrix Factorization with Graph-Based Priors
Authors	Jonathan Strahl, Jaakko Peltonen, Hiroshi Mamitsuka, Samuel Kaski
Abstract	In matrix factorization, available graph side-information may not be well suited for the matrix completion problem, having edges that disagree with the latent-feature relations learnt from the incomplete data matrix. We show that removing these $\textit{contested}$ edges improves prediction accuracy and scalability. We identify the contested edges through a highly-efficient graphical lasso approximation. The identification and removal of contested edges adds no computational complexity to state-of-the-art graph-regularized matrix factorization, remaining linear with respect to the number of non-zeros. Computational load even decreases proportional to the number of edges removed. Formulating a probabilistic generative model and using expectation maximization to extend graph-regularised alternating least squares (GRALS) guarantees convergence. Rich simulated experiments illustrate the desired properties of the resulting algorithm. On real data experiments we demonstrate improved prediction accuracy with fewer graph edges (empirical evidence that graph side-information is often inaccurate). A 300 thousand dimensional graph with three million edges (Yahoo music side-information) can be analyzed in under ten minutes on a standard laptop computer demonstrating the efficiency of our graph update.
Tasks	Matrix Completion
Published	2019-08-25
URL	https://arxiv.org/abs/1908.09393v2
PDF	https://arxiv.org/pdf/1908.09393v2.pdf
PWC	https://paperswithcode.com/paper/scalable-probabilistic-matrix-factorization
Repo
Framework

Dataset shift quantification for credit card fraud detection


Title	Dataset shift quantification for credit card fraud detection
Authors	Yvan Lucas, Pierre-Edouard Portier, Léa Laporte, Sylvie Calabretto, Liyun He-Guelton, Frederic Oblé, Michael Granitzer
Abstract	Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However purchase behaviour and fraudster strategies may change over time. This phenomenon is named dataset shift or concept drift in the domain of fraud detection. In this paper, we present a method to quantify day-by-day the dataset shift in our face-to-face credit card transactions dataset (card holder located in the shop) . In practice, we classify the days against each other and measure the efficiency of the classification. The more efficient the classification, the more different the buying behaviour between two days, and vice versa. Therefore, we obtain a distance matrix characterizing the dataset shift. After an agglomerative clustering of the distance matrix, we observe that the dataset shift pattern matches the calendar events for this time period (holidays, week-ends, etc). We then incorporate this dataset shift knowledge in the credit card fraud detection task as a new feature. This leads to a small improvement of the detection.
Tasks	Fraud Detection
Published	2019-06-17
URL	https://arxiv.org/abs/1906.06977v1
PDF	https://arxiv.org/pdf/1906.06977v1.pdf
PWC	https://paperswithcode.com/paper/dataset-shift-quantification-for-credit-card
Repo
Framework

Bias-aware model selection for machine learning of doubly robust functionals


Title	Bias-aware model selection for machine learning of doubly robust functionals
Authors	Yifan Cui, Eric Tchetgen Tchetgen
Abstract	While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, model selection of possibly high dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a new model selection framework for making inferences about a finite dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function. The class of such doubly robust functionals is quite large, including many missing data and causal inference problems. Under double robustness, the estimated functional should incur no bias if either of two nuisance parameters is evaluated at the truth while the other spans a large collection of candidate models. We introduce two model selection criteria for bias reduction of functional of interest, each based on a novel definition of pseudo-risk for the functional that embodies this double robustness property and thus may be used to select the candidate model that is nearest to fulfilling this property even when all models are wrong. Both selection criteria have a bias awareness property that selection of one nuisance parameter can be made to compensate for excessive bias due to poor learning of the other nuisance parameter. We establish an oracle property for a multi-fold cross-validation version of the new model selection criteria which states that our empirical criteria perform nearly as well as an oracle with a priori knowledge of the pseudo-risk for each candidate model. We also describe a smooth approximation to the selection criteria which allows for valid post-selection inference. Finally, we perform model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learning methods to account for confounding in a study of right heart catheterization in the ICU of critically ill patients.
Tasks	Causal Inference, Density Estimation, Model Selection
Published	2019-11-05
URL	https://arxiv.org/abs/1911.02029v1
PDF	https://arxiv.org/pdf/1911.02029v1.pdf
PWC	https://paperswithcode.com/paper/bias-aware-model-selection-for-machine
Repo
Framework

Towards Efficient Neural Networks On-a-chip: Joint Hardware-Algorithm Approaches


Title	Towards Efficient Neural Networks On-a-chip: Joint Hardware-Algorithm Approaches
Authors	Xiaocong Du, Gokul Krishnan, Abinash Mohanty, Zheng Li, Gouranga Charan, Yu Cao
Abstract	Machine learning algorithms have made significant advances in many applications. However, their hardware implementation on the state-of-the-art platforms still faces several challenges and are limited by various factors, such as memory volume, memory bandwidth and interconnection overhead. The adoption of the crossbar architecture with emerging memory technology partially solves the problem but induces process variation and other concerns. In this paper, we will present novel solutions to two fundamental issues in crossbar implementation of Artificial Intelligence (AI) algorithms: device variation and insufficient interconnections. These solutions are inspired by the statistical properties of algorithms themselves, especially the redundancy in neural network nodes and connections. By Random Sparse Adaptation and pruning the connections following the Small-World model, we demonstrate robust and efficient performance on representative datasets such as MNIST and CIFAR-10. Moreover, we present Continuous Growth and Pruning algorithm for future learning and adaptation on hardware.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1906.08866v1
PDF	https://arxiv.org/pdf/1906.08866v1.pdf
PWC	https://paperswithcode.com/paper/towards-efficient-neural-networks-on-a-chip
Repo
Framework

The Importance of Metric Learning for Robotic Vision: Open Set Recognition and Active Learning


Title	The Importance of Metric Learning for Robotic Vision: Open Set Recognition and Active Learning
Authors	Benjamin J. Meyer, Tom Drummond
Abstract	State-of-the-art deep neural network recognition systems are designed for a static and closed world. It is usually assumed that the distribution at test time will be the same as the distribution during training. As a result, classifiers are forced to categorise observations into one out of a set of predefined semantic classes. Robotic problems are dynamic and open world; a robot will likely observe objects that are from outside of the training set distribution. Classifier outputs in robotic applications can lead to real-world robotic action and as such, a practical recognition system should not silently fail by confidently misclassifying novel observations. We show how a deep metric learning classification system can be applied to such open set recognition problems, allowing the classifier to label novel observations as unknown. Further to detecting novel examples, we propose an open set active learning approach that allows a robot to efficiently query a user about unknown observations. Our approach enables a robot to improve its understanding of the true distribution of data in the environment, from a small number of label queries. Experimental results show that our approach significantly outperforms comparable methods in both the open set recognition and active learning problems.
Tasks	Active Learning, Metric Learning, Open Set Learning
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10363v1
PDF	http://arxiv.org/pdf/1902.10363v1.pdf
PWC	https://paperswithcode.com/paper/the-importance-of-metric-learning-for-robotic
Repo
Framework

Grammatical Error Correction and Style Transfer via Zero-shot Monolingual Translation


Title	Grammatical Error Correction and Style Transfer via Zero-shot Monolingual Translation
Authors	Elizaveta Korotkova, Agnes Luhtaru, Maksym Del, Krista Liin, Daiga Deksne, Mark Fishel
Abstract	Both grammatical error correction and text style transfer can be viewed as monolingual sequence-to-sequence transformation tasks, but the scarcity of directly annotated data for either task makes them unfeasible for most languages. We present an approach that does both tasks within the same trained model, and only uses regular language parallel data, without requiring error-corrected or style-adapted texts. We apply our model to three languages and present a thorough evaluation on both tasks, showing that the model is reliable for a number of error types and style transfer aspects.
Tasks	Grammatical Error Correction, Style Transfer, Text Style Transfer
Published	2019-03-27
URL	https://arxiv.org/abs/1903.11283v2
PDF	https://arxiv.org/pdf/1903.11283v2.pdf
PWC	https://paperswithcode.com/paper/grammatical-error-correction-and-style
Repo
Framework

Bridging the Gap for Tokenizer-Free Language Models


Title	Bridging the Gap for Tokenizer-Free Language Models
Authors	Dokook Choe, Rami Al-Rfou, Mandy Guo, Heeyoung Lee, Noah Constant
Abstract	Purely character-based language models (LMs) have been lagging in quality on large scale datasets, and current state-of-the-art LMs rely on word tokenization. It has been assumed that injecting the prior knowledge of a tokenizer into the model is essential to achieving competitive results. In this paper, we show that contrary to this conventional wisdom, tokenizer-free LMs with sufficient capacity can achieve competitive performance on a large scale dataset. We train a vanilla transformer network with 40 self-attention layers on the One Billion Word (lm1b) benchmark and achieve a new state of the art for tokenizer-free LMs, pushing these models to be on par with their word-based counterparts.
Tasks	Tokenization
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10322v1
PDF	https://arxiv.org/pdf/1908.10322v1.pdf
PWC	https://paperswithcode.com/paper/bridging-the-gap-for-tokenizer-free-language
Repo
Framework

Onto Word Segmentation of the Complete Tang Poems


Title	Onto Word Segmentation of the Complete Tang Poems
Authors	Chao-Lin Liu
Abstract	We aim at segmenting words in the Complete Tang Poems (CTP). Although it is possible to do some research about CTP without doing full-scale word segmentation, we must move forward to word-level analysis of CTP for conducting advanced research topics. In November 2018 when we submitted the manuscript for DH 2019 (ADHO), we collected only 2433 poems that were segmented by trained experts, and used the segmented poems to evaluate the segmenter that considered domain knowledge of Chinese poetry. We trained pointwise mutual information (PMI) between Chinese characters based on the CTP poems (excluding the 2433 poems, which were used exclusively only for testing) and the domain knowledge. The segmenter relied on the PMI information to the recover 85.7% of words in the test poems. We could segment a poem completely correct only 17.8% of the time, however. When we presented our work at DH 2019, we have annotated more than 20000 poems. With a much larger amount of data, we were able to apply biLSTM models for this word segmentation task, and we segmented a poem completely correct above 20% of the time. In contrast, human annotators completely agreed on their annotations about 40% of the time.
Tasks
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10621v1
PDF	https://arxiv.org/pdf/1908.10621v1.pdf
PWC	https://paperswithcode.com/paper/onto-word-segmentation-of-the-complete-tang
Repo
Framework

Visual Imitation Learning with Recurrent Siamese Networks


Title	Visual Imitation Learning with Recurrent Siamese Networks
Authors	Glen Berseth, Christopher J. Pal
Abstract	It would be desirable for a reinforcement learning (RL) based agent to learn behaviour by merely watching a demonstration. However, defining rewards that facilitate this goal within the RL paradigm remains a challenge. Here we address this problem with Siamese networks, trained to compute distances between observed behaviours and the agent’s behaviours. Given a desired motion such Siamese networks can be used to provide a reward signal to an RL agent via the distance between the desired motion and the agent’s motion. We experiment with an RNN-based comparator model that can compute distances in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we have had also found that the inclusion of multi-task data and an additional image encoding loss helps enforce the temporal consistency. These two components appear to balance reward for matching a specific instance of behaviour versus that behaviour in general. Furthermore, we focus here on a particularly challenging form of this problem where only a single demonstration is provided for a given task – the one-shot learning setting. We demonstrate our approach on humanoid agents in both 2D with $10$ degrees of freedom (DoF) and 3D with $38$ DoF.
Tasks	Imitation Learning, One-Shot Learning
Published	2019-01-22
URL	https://arxiv.org/abs/1901.07186v2
PDF	https://arxiv.org/pdf/1901.07186v2.pdf
PWC	https://paperswithcode.com/paper/visual-imitation-learning-with-recurrent
Repo
Framework

Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards


Title	Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards
Authors	Xingyu Lu, Stas Tiomkin, Pieter Abbeel
Abstract	While recent progress in deep reinforcement learning has enabled robots to learn complex behaviors, tasks with long horizons and sparse rewards remain an ongoing challenge. In this work, we propose an effective reward shaping method through predictive coding to tackle sparse reward problems. By learning predictive representations offline and using these representations for reward shaping, we gain access to reward signals that understand the structure and dynamics of the environment. In particular, our method achieves better learning by providing reward signals that 1) understand environment dynamics 2) emphasize on features most useful for learning 3) resist noise in learned representations through reward accumulation. We demonstrate the usefulness of this approach in different domains ranging from robotic manipulation to navigation, and we show that reward signals produced through predictive coding are as effective for learning as hand-crafted rewards.
Tasks
Published	2019-12-21
URL	https://arxiv.org/abs/1912.13414v1
PDF	https://arxiv.org/pdf/1912.13414v1.pdf
PWC	https://paperswithcode.com/paper/predictive-coding-for-boosting-deep-1
Repo
Framework