April 1, 2020

2973 words 14 mins read

Paper Group ANR 493

PHOTON – A Python API for Rapid Machine Learning Model Development. Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment. Unpaired Image-to-Image Translation using Adversarial Consistency Loss. Adaptive Name Entity Recognition under Highly Unbalanced Data. What Emotions Make One or Five Stars? Unders …

PHOTON – A Python API for Rapid Machine Learning Model Development


Title	PHOTON – A Python API for Rapid Machine Learning Model Development
Authors	Ramona Leenings, Nils Ralf Winter, Lucas Plagwitz, Vincent Holstein, Jan Ernsting, Jakob Steenweg, Julian Gebker, Kelvin Sarink, Daniel Emden, Dominik Grotegerd, Nils Opel, Benjamin Risse, Xiaoyi Jiang, Udo Dannlowski, Tim Hahn
Abstract	This article describes the implementation and use of PHOTON, a high-level Python API designed to simplify and accelerate the process of machine learning model development. It enables designing both basic and advanced machine learning pipeline architectures and automatizes the repetitive training, optimization and evaluation workflow. PHOTON offers easy access to established machine learning toolboxes as well as the possibility to integrate custom algorithms and solutions for any part of the model construction and evaluation process. By adding a layer of abstraction incorporating current best practices it offers an easy-to-use, flexible approach to implementing fast, reproducible, and unbiased machine learning solutions.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05426v1
PDF	https://arxiv.org/pdf/2002.05426v1.pdf
PWC	https://paperswithcode.com/paper/photon-a-python-api-for-rapid-machine
Repo
Framework

Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment


Title	Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment
Authors	Ben Usman, Nick Dufour, Avneesh Sud, Kate Saenko
Abstract	Unsupervised distribution alignment has many applications in deep learning, including domain adaptation and unsupervised image-to-image translation. Most prior work on unsupervised distribution alignment relies either on minimizing simple non-parametric statistical distances such as maximum mean discrepancy, or on adversarial alignment. However, the former fails to capture the structure of complex real-world distributions, while the latter is difficult to train and does not provide any universal convergence guarantees or automatic quantitative validation procedures. In this paper we propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows. We show that, under certain assumptions, this combination yields a deep neural likelihood-based minimization objective that attains a known lower bound upon convergence. We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
Tasks	Domain Adaptation, Image-to-Image Translation, Unsupervised Image-To-Image Translation
Published	2020-03-26
URL	https://arxiv.org/abs/2003.12170v1
PDF	https://arxiv.org/pdf/2003.12170v1.pdf
PWC	https://paperswithcode.com/paper/log-likelihood-ratio-minimizing-flows-towards
Repo
Framework

Unpaired Image-to-Image Translation using Adversarial Consistency Loss


Title	Unpaired Image-to-Image Translation using Adversarial Consistency Loss
Authors	Yihao Zhao, Ruihai Wu, Hao Dong
Abstract	Unpaired image-to-image translation is a class of vision problems whose goal is to find the mapping between different image domains using unpaired training data. Cycle-consistency loss is a widely used constraint for such problems. However, due to the strict pixel-level constraint, it cannot perform geometric changes, remove large objects, or ignore irrelevant texture. In this paper, we propose a novel adversarial-consistency loss for image-to-image translation. This loss does not require the translated image to be translated back to be a specific source image but can encourage the translated images to retain important features of the source images and overcome the drawbacks of cycle-consistency loss noted above. Our method achieves state-of-the-art results on three challenging tasks: glasses removal, male-to-female translation, and selfie-to-anime translation.
Tasks	Image-to-Image Translation
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04858v1
PDF	https://arxiv.org/pdf/2003.04858v1.pdf
PWC	https://paperswithcode.com/paper/unpaired-image-to-image-translation-using-2
Repo
Framework

Adaptive Name Entity Recognition under Highly Unbalanced Data


Title	Adaptive Name Entity Recognition under Highly Unbalanced Data
Authors	Thong Nguyen, Duy Nguyen, Pramod Rao
Abstract	For several purposes in Natural Language Processing (NLP), such as Information Extraction, Sentiment Analysis or Chatbot, Named Entity Recognition (NER) holds an important role as it helps to determine and categorize entities in text into predefined groups such as the names of persons, locations, quantities, organizations or percentages, etc. In this report, we present our experiments on a neural architecture composed of a Conditional Random Field (CRF) layer stacked on top of a Bi-directional LSTM (BI-LSTM) layer for solving NER tasks. Besides, we also employ a fusion input of embedding vectors (Glove, BERT), which are pre-trained on the huge corpus to boost the generalization capacity of the model. Unfortunately, due to the heavy unbalanced distribution cross-training data, both approaches just attained a bad performance on less training samples classes. To overcome this challenge, we introduce an add-on classification model to split sentences into two different sets: Weak and Strong classes and then designing a couple of Bi-LSTM-CRF models properly to optimize performance on each set. We evaluated our models on the test set and discovered that our method can improve performance for Weak classes significantly by using a very small data set (approximately 0.45%) compared to the rest classes.
Tasks	Chatbot, Named Entity Recognition, Sentiment Analysis
Published	2020-03-10
URL	https://arxiv.org/abs/2003.10296v1
PDF	https://arxiv.org/pdf/2003.10296v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-name-entity-recognition-under-highly
Repo
Framework

What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI


Title	What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI
Authors	Chaehan So
Abstract	When people buy products online, they primarily base their decisions on the recommendations of others given in online reviews. The current work analyzed these online reviews by sentiment analysis and used the extracted sentiments as features to predict the product ratings by several machine learning algorithms. These predictions were disentangled by various meth-ods of explainable AI (XAI) to understand whether the model showed any bias during prediction. Study 1 benchmarked these algorithms (knn, support vector machines, random forests, gradient boosting machines, XGBoost) and identified random forests and XGBoost as best algorithms for predicting the product ratings. In Study 2, the analysis of global feature importance identified the sentiment joy and the emotional valence negative as most predictive features. Two XAI visualization methods, local feature attributions and partial dependency plots, revealed several incorrect prediction mechanisms on the instance-level. Performing the benchmarking as classification, Study 3 identified a high no-information rate of 64.4% that indicated high class imbalance as underlying reason for the identified problems. In conclusion, good performance by machine learning algorithms must be taken with caution because the dataset, as encountered in this work, could be biased towards certain predictions. This work demonstrates how XAI methods reveal such prediction bias.
Tasks	Feature Importance, Sentiment Analysis
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00201v1
PDF	https://arxiv.org/pdf/2003.00201v1.pdf
PWC	https://paperswithcode.com/paper/what-emotions-make-one-or-five-stars
Repo
Framework

Stochastic Natural Language Generation Using Dependency Information


Title	Stochastic Natural Language Generation Using Dependency Information
Authors	Elham Seifossadat, Hossein Sameti
Abstract	This article presents a stochastic corpus-based model for generating natural language text. Our model first encodes dependency relations from training data through a feature set, then concatenates these features to produce a new dependency tree for a given meaning representation, and finally generates a natural language utterance from the produced dependency tree. We test our model on nine domains from tabular, dialogue act and RDF format. Our model outperforms the corpus-based state-of-the-art methods trained on tabular datasets and also achieves comparable results with neural network-based approaches trained on dialogue act, E2E and WebNLG datasets for BLEU and ERR evaluation metrics. Also, by reporting Human Evaluation results, we show that our model produces high-quality utterances in aspects of informativeness and naturalness as well as quality.
Tasks	Text Generation
Published	2020-01-12
URL	https://arxiv.org/abs/2001.03897v1
PDF	https://arxiv.org/pdf/2001.03897v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-natural-language-generation-using
Repo
Framework

Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks


Title	Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks
Authors	Souvik Kundu, Mahdi Nazemi, Massoud Pedram, Keith M. Chugg, Peter A. Beerel
Abstract	The high energy cost of processing deep convolutional neural networks impedes their ubiquitous deployment in energy-constrained platforms such as embedded systems and IoT devices. This work introduces convolutional layers with pre-defined sparse 2D kernels that have support sets that repeat periodically within and across filters. Due to the efficient storage of our periodic sparse kernels, the parameter savings can translate into considerable improvements in energy efficiency due to reduced DRAM accesses, thus promising significant improvements in the trade-off between energy consumption and accuracy for both training and inference. To evaluate this approach, we performed experiments with two widely accepted datasets, CIFAR-10 and Tiny ImageNet in sparse variants of the ResNet18 and VGG16 architectures. Compared to baseline models, our proposed sparse variants require up to 82% fewer model parameters with 5.6times fewer FLOPs with negligible loss in accuracy for ResNet18 on CIFAR-10. For VGG16 trained on Tiny ImageNet, our approach requires 5.8times fewer FLOPs and up to 83.3% fewer model parameters with a drop in top-5 (top-1) accuracy of only 1.2% (2.1%). We also compared the performance of our proposed architectures with that of ShuffleNet andMobileNetV2. Using similar hyperparameters and FLOPs, our ResNet18 variants yield an average accuracy improvement of 2.8%.
Tasks
Published	2020-01-29
URL	https://arxiv.org/abs/2001.10710v2
PDF	https://arxiv.org/pdf/2001.10710v2.pdf
PWC	https://paperswithcode.com/paper/pre-defined-sparsity-for-low-complexity
Repo
Framework

Learning Reinforced Agents with Counterfactual Simulation for Medical Automatic Diagnosis


Title	Learning Reinforced Agents with Counterfactual Simulation for Medical Automatic Diagnosis
Authors	Junfan Lin, Ziliang Chen, Xiaodan Liang, Keze Wang, Liang Lin
Abstract	Medical automatic diagnosis (MAD) aims to learn an agent that mimics the behavior of a human doctor, i.e. inquiring symptoms and informing diseases. Due to medical ethics concerns, it is impractical to directly apply reinforcement learning techniques to solving MAD, e.g., training a reinforced agent with the human patient. Developing a patient simulator by using the collected patient-doctor dialogue records has been proposed as a promising approach to MAD. However, most of these existing works overlook the causal relationship between patient symptoms and disease diagnoses. For example, these simulators simply generate the ``not-sure’’ response to the inquiry (i.e., symptom) that was not observed in one dialogue record. As a result, the MAD agent is usually trained without exploiting the counterfactual reasoning beyond the factual observations. To address this problem, this paper presents a propensity-based patient simulator (PBPS), which is capable of facilitating the training of MAD agents by generating informative counterfactual answers along with the disease diagnosis. Specifically, our PBPS estimates the propensity score of each record with the patient-doctor dialogue reasoning, and can thus generate the counterfactual answers by searching across records. That is, the unrecorded symptom for one patient can be found in the records of other patients according to the propensity score matching. A progressive assurance agent (P2A) can be thus trained with PBPS, which includes two separate yet cooperative branches accounting for the execution of symptom-inquiry and disease-diagnosis actions, respectively. The disease-diagnosis predicts the confidence of disease and drives the symptom-inquiry in terms of enhancing the confidence, and the two branches are jointly optimized with benefiting from each other. \|
Tasks
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06534v1
PDF	https://arxiv.org/pdf/2003.06534v1.pdf
PWC	https://paperswithcode.com/paper/learning-reinforced-agents-with
Repo
Framework

Not all domains are equally complex: Adaptive Multi-Domain Learning


Title	Not all domains are equally complex: Adaptive Multi-Domain Learning
Authors	Ali Senhaji, Jenni Raitoharju, Moncef Gabbouj, Alexandros Iosifidis
Abstract	Deep learning approaches are highly specialized and require training separate models for different tasks. Multi-domain learning looks at ways to learn a multitude of different tasks, each coming from a different domain, at once. The most common approach in multi-domain learning is to form a domain agnostic model, the parameters of which are shared among all domains, and learn a small number of extra domain-specific parameters for each individual new domain. However, different domains come with different levels of difficulty; parameterizing the models of all domains using an augmented version of the domain agnostic model leads to unnecessarily inefficient solutions, especially for easy to solve tasks. We propose an adaptive parameterization approach to deep neural networks for multi-domain learning. The proposed approach performs on par with the original approach while reducing by far the number of parameters, leading to efficient multi-domain learning solutions.
Tasks
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11504v1
PDF	https://arxiv.org/pdf/2003.11504v1.pdf
PWC	https://paperswithcode.com/paper/not-all-domains-are-equally-complex-adaptive
Repo
Framework

Distributional Robustness and Regularization in Reinforcement Learning


Title	Distributional Robustness and Regularization in Reinforcement Learning
Authors	Esther Derman, Shie Mannor
Abstract	Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO’s extension to sequential decision-making overcomes $\textit{external uncertainty}$ through the robust Markov Decision Process (MDP) setting, the resulting formulation is hard to solve, especially on large domains. On the other hand, existing regularization methods in reinforcement learning only address $\textit{internal uncertainty}$ due to stochasticity. Our study aims to facilitate robust reinforcement learning by establishing a dual relation between robust MDPs and regularization. We introduce Wasserstein distributionally robust MDPs and prove that they hold out-of-sample performance guarantees. Then, we introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. We extend the result to linear value function approximation for large state spaces. Our approach provides an alternative formulation of robustness with guaranteed finite-sample performance. Moreover, it suggests using regularization as a practical tool for dealing with $\textit{external uncertainty}$ in reinforcement learning methods.
Tasks	Decision Making
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02894v1
PDF	https://arxiv.org/pdf/2003.02894v1.pdf
PWC	https://paperswithcode.com/paper/distributional-robustness-and-regularization
Repo
Framework

On implicit regularization: Morse functions and applications to matrix factorization


Title	On implicit regularization: Morse functions and applications to matrix factorization
Authors	Mohamed Ali Belabbas
Abstract	In this paper, we revisit implicit regularization from the ground up using notions from dynamical systems and invariant subspaces of Morse functions. The key contributions are a new criterion for implicit regularization—a leading contender to explain the generalization power of deep models such as neural networks—and a general blueprint to study it. We apply these techniques to settle a conjecture on implicit regularization in matrix factorization.
Tasks
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04264v2
PDF	https://arxiv.org/pdf/2001.04264v2.pdf
PWC	https://paperswithcode.com/paper/on-implicit-regularization-morse-functions
Repo
Framework

Interpolated Adjoint Method for Neural ODEs


Title	Interpolated Adjoint Method for Neural ODEs
Authors	Talgat Daulbaev, Alexandr Katrutsa, Larisa Markeeva, Julia Gusak, Andrzej Cichocki, Ivan Oseledets
Abstract	In this paper, we propose a method, which allows us to alleviate or completely avoid the notorious problem of numerical instability and stiffness of the adjoint method for training neural ODE. On the backward pass, we propose to use the machinery of smooth function interpolation to restore the trajectory obtained during the forward integration. We show the viability of our approach, both in theory and practice.
Tasks
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05271v1
PDF	https://arxiv.org/pdf/2003.05271v1.pdf
PWC	https://paperswithcode.com/paper/interpolated-adjoint-method-for-neural-odes
Repo
Framework

Improving S&P stock prediction with time series stock similarity


Title	Improving S&P stock prediction with time series stock similarity
Authors	Lior Sidi
Abstract	Stock market prediction with forecasting algorithms is a popular topic these days where most of the forecasting algorithms train only on data collected on a particular stock. In this paper, we enriched the stock data with related stocks just as a professional trader would have done to improve the stock prediction models. We tested five different similarities functions and found co-integration similarity to have the best improvement on the prediction model. We evaluate the models on seven S&P stocks from various industries over five years period. The prediction model we trained on similar stocks had significantly better results with 0.55 mean accuracy, and 19.782 profit compare to the state of the art model with an accuracy of 0.52 and profit of 6.6.
Tasks	Stock Market Prediction, Stock Prediction, Time Series
Published	2020-02-08
URL	https://arxiv.org/abs/2002.05784v1
PDF	https://arxiv.org/pdf/2002.05784v1.pdf
PWC	https://paperswithcode.com/paper/improving-sp-stock-prediction-with-time
Repo
Framework

OmniTact: A Multi-Directional High Resolution Touch Sensor


Title	OmniTact: A Multi-Directional High Resolution Touch Sensor
Authors	Akhil Padmanabha, Frederik Ebert, Stephen Tian, Roberto Calandra, Chelsea Finn, Sergey Levine
Abstract	Incorporating touch as a sensing modality for robots can enable finer and more robust manipulation skills. Existing tactile sensors are either flat, have small sensitive fields or only provide low-resolution signals. In this paper, we introduce OmniTact, a multi-directional high-resolution tactile sensor. OmniTact is designed to be used as a fingertip for robotic manipulation with robotic hands, and uses multiple micro-cameras to detect multi-directional deformations of a gel-based skin. This provides a rich signal from which a variety of different contact state variables can be inferred using modern image processing and computer vision methods. We evaluate the capabilities of OmniTact on a challenging robotic control task that requires inserting an electrical connector into an outlet, as well as a state estimation problem that is representative of those typically encountered in dexterous robotic manipulation, where the goal is to infer the angle of contact of a curved finger pressing against an object. Both tasks are performed using only touch sensing and deep convolutional neural networks to process images from the sensor’s cameras. We compare with a state-of-the-art tactile sensor that is only sensitive on one side, as well as a state-of-the-art multi-directional tactile sensor, and find that OmniTact’s combination of high-resolution and multi-directional sensing is crucial for reliably inserting the electrical connector and allows for higher accuracy in the state estimation task. Videos and supplementary material can be found at https://sites.google.com/berkeley.edu/omnitact
Tasks
Published	2020-03-16
URL	https://arxiv.org/abs/2003.06965v1
PDF	https://arxiv.org/pdf/2003.06965v1.pdf
PWC	https://paperswithcode.com/paper/omnitact-a-multi-directional-high-resolution
Repo
Framework

Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT


Title	Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT
Authors	Lichao Sun, Kazuma Hashimoto, Wenpeng Yin, Akari Asai, Jia Li, Philip Yu, Caiming Xiong
Abstract	There is an increasing amount of literature that claims the brittleness of deep neural networks in dealing with adversarial examples that are created maliciously. It is unclear, however, how the models will perform in realistic scenarios where \textit{natural rather than malicious} adversarial instances often exist. This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data, particularly mistakes in typing the keyboard, that occur inadvertently. Intensive experiments on sentiment analysis and question answering benchmarks indicate that: (i) Typos in various words of a sentence do not influence equally. The typos in informative words make severer damages; (ii) Mistype is the most damaging factor, compared with inserting, deleting, etc.; (iii) Humans and machines have different focuses on recognizing adversarial attacks.
Tasks	Question Answering, Sentiment Analysis
Published	2020-02-27
URL	https://arxiv.org/abs/2003.04985v1
PDF	https://arxiv.org/pdf/2003.04985v1.pdf
PWC	https://paperswithcode.com/paper/adv-bert-bert-is-not-robust-on-misspellings
Repo
Framework