Paper Group ANR 493
PHOTON – A Python API for Rapid Machine Learning Model Development. Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment. Unpaired Image-to-Image Translation using Adversarial Consistency Loss. Adaptive Name Entity Recognition under Highly Unbalanced Data. What Emotions Make One or Five Stars? Unders …
PHOTON – A Python API for Rapid Machine Learning Model Development
Title | PHOTON – A Python API for Rapid Machine Learning Model Development |
Authors | Ramona Leenings, Nils Ralf Winter, Lucas Plagwitz, Vincent Holstein, Jan Ernsting, Jakob Steenweg, Julian Gebker, Kelvin Sarink, Daniel Emden, Dominik Grotegerd, Nils Opel, Benjamin Risse, Xiaoyi Jiang, Udo Dannlowski, Tim Hahn |
Abstract | This article describes the implementation and use of PHOTON, a high-level Python API designed to simplify and accelerate the process of machine learning model development. It enables designing both basic and advanced machine learning pipeline architectures and automatizes the repetitive training, optimization and evaluation workflow. PHOTON offers easy access to established machine learning toolboxes as well as the possibility to integrate custom algorithms and solutions for any part of the model construction and evaluation process. By adding a layer of abstraction incorporating current best practices it offers an easy-to-use, flexible approach to implementing fast, reproducible, and unbiased machine learning solutions. |
Tasks | |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05426v1 |
https://arxiv.org/pdf/2002.05426v1.pdf | |
PWC | https://paperswithcode.com/paper/photon-a-python-api-for-rapid-machine |
Repo | |
Framework | |
Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment
Title | Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment |
Authors | Ben Usman, Nick Dufour, Avneesh Sud, Kate Saenko |
Abstract | Unsupervised distribution alignment has many applications in deep learning, including domain adaptation and unsupervised image-to-image translation. Most prior work on unsupervised distribution alignment relies either on minimizing simple non-parametric statistical distances such as maximum mean discrepancy, or on adversarial alignment. However, the former fails to capture the structure of complex real-world distributions, while the latter is difficult to train and does not provide any universal convergence guarantees or automatic quantitative validation procedures. In this paper we propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows. We show that, under certain assumptions, this combination yields a deep neural likelihood-based minimization objective that attains a known lower bound upon convergence. We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains. |
Tasks | Domain Adaptation, Image-to-Image Translation, Unsupervised Image-To-Image Translation |
Published | 2020-03-26 |
URL | https://arxiv.org/abs/2003.12170v1 |
https://arxiv.org/pdf/2003.12170v1.pdf | |
PWC | https://paperswithcode.com/paper/log-likelihood-ratio-minimizing-flows-towards |
Repo | |
Framework | |
Unpaired Image-to-Image Translation using Adversarial Consistency Loss
Title | Unpaired Image-to-Image Translation using Adversarial Consistency Loss |
Authors | Yihao Zhao, Ruihai Wu, Hao Dong |
Abstract | Unpaired image-to-image translation is a class of vision problems whose goal is to find the mapping between different image domains using unpaired training data. Cycle-consistency loss is a widely used constraint for such problems. However, due to the strict pixel-level constraint, it cannot perform geometric changes, remove large objects, or ignore irrelevant texture. In this paper, we propose a novel adversarial-consistency loss for image-to-image translation. This loss does not require the translated image to be translated back to be a specific source image but can encourage the translated images to retain important features of the source images and overcome the drawbacks of cycle-consistency loss noted above. Our method achieves state-of-the-art results on three challenging tasks: glasses removal, male-to-female translation, and selfie-to-anime translation. |
Tasks | Image-to-Image Translation |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04858v1 |
https://arxiv.org/pdf/2003.04858v1.pdf | |
PWC | https://paperswithcode.com/paper/unpaired-image-to-image-translation-using-2 |
Repo | |
Framework | |
Adaptive Name Entity Recognition under Highly Unbalanced Data
Title | Adaptive Name Entity Recognition under Highly Unbalanced Data |
Authors | Thong Nguyen, Duy Nguyen, Pramod Rao |
Abstract | For several purposes in Natural Language Processing (NLP), such as Information Extraction, Sentiment Analysis or Chatbot, Named Entity Recognition (NER) holds an important role as it helps to determine and categorize entities in text into predefined groups such as the names of persons, locations, quantities, organizations or percentages, etc. In this report, we present our experiments on a neural architecture composed of a Conditional Random Field (CRF) layer stacked on top of a Bi-directional LSTM (BI-LSTM) layer for solving NER tasks. Besides, we also employ a fusion input of embedding vectors (Glove, BERT), which are pre-trained on the huge corpus to boost the generalization capacity of the model. Unfortunately, due to the heavy unbalanced distribution cross-training data, both approaches just attained a bad performance on less training samples classes. To overcome this challenge, we introduce an add-on classification model to split sentences into two different sets: Weak and Strong classes and then designing a couple of Bi-LSTM-CRF models properly to optimize performance on each set. We evaluated our models on the test set and discovered that our method can improve performance for Weak classes significantly by using a very small data set (approximately 0.45%) compared to the rest classes. |
Tasks | Chatbot, Named Entity Recognition, Sentiment Analysis |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.10296v1 |
https://arxiv.org/pdf/2003.10296v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-name-entity-recognition-under-highly |
Repo | |
Framework | |
What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI
Title | What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI |
Authors | Chaehan So |
Abstract | When people buy products online, they primarily base their decisions on the recommendations of others given in online reviews. The current work analyzed these online reviews by sentiment analysis and used the extracted sentiments as features to predict the product ratings by several machine learning algorithms. These predictions were disentangled by various meth-ods of explainable AI (XAI) to understand whether the model showed any bias during prediction. Study 1 benchmarked these algorithms (knn, support vector machines, random forests, gradient boosting machines, XGBoost) and identified random forests and XGBoost as best algorithms for predicting the product ratings. In Study 2, the analysis of global feature importance identified the sentiment joy and the emotional valence negative as most predictive features. Two XAI visualization methods, local feature attributions and partial dependency plots, revealed several incorrect prediction mechanisms on the instance-level. Performing the benchmarking as classification, Study 3 identified a high no-information rate of 64.4% that indicated high class imbalance as underlying reason for the identified problems. In conclusion, good performance by machine learning algorithms must be taken with caution because the dataset, as encountered in this work, could be biased towards certain predictions. This work demonstrates how XAI methods reveal such prediction bias. |
Tasks | Feature Importance, Sentiment Analysis |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.00201v1 |
https://arxiv.org/pdf/2003.00201v1.pdf | |
PWC | https://paperswithcode.com/paper/what-emotions-make-one-or-five-stars |
Repo | |
Framework | |
Stochastic Natural Language Generation Using Dependency Information
Title | Stochastic Natural Language Generation Using Dependency Information |
Authors | Elham Seifossadat, Hossein Sameti |
Abstract | This article presents a stochastic corpus-based model for generating natural language text. Our model first encodes dependency relations from training data through a feature set, then concatenates these features to produce a new dependency tree for a given meaning representation, and finally generates a natural language utterance from the produced dependency tree. We test our model on nine domains from tabular, dialogue act and RDF format. Our model outperforms the corpus-based state-of-the-art methods trained on tabular datasets and also achieves comparable results with neural network-based approaches trained on dialogue act, E2E and WebNLG datasets for BLEU and ERR evaluation metrics. Also, by reporting Human Evaluation results, we show that our model produces high-quality utterances in aspects of informativeness and naturalness as well as quality. |
Tasks | Text Generation |
Published | 2020-01-12 |
URL | https://arxiv.org/abs/2001.03897v1 |
https://arxiv.org/pdf/2001.03897v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-natural-language-generation-using |
Repo | |
Framework | |
Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks
Title | Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks |
Authors | Souvik Kundu, Mahdi Nazemi, Massoud Pedram, Keith M. Chugg, Peter A. Beerel |
Abstract | The high energy cost of processing deep convolutional neural networks impedes their ubiquitous deployment in energy-constrained platforms such as embedded systems and IoT devices. This work introduces convolutional layers with pre-defined sparse 2D kernels that have support sets that repeat periodically within and across filters. Due to the efficient storage of our periodic sparse kernels, the parameter savings can translate into considerable improvements in energy efficiency due to reduced DRAM accesses, thus promising significant improvements in the trade-off between energy consumption and accuracy for both training and inference. To evaluate this approach, we performed experiments with two widely accepted datasets, CIFAR-10 and Tiny ImageNet in sparse variants of the ResNet18 and VGG16 architectures. Compared to baseline models, our proposed sparse variants require up to 82% fewer model parameters with 5.6times fewer FLOPs with negligible loss in accuracy for ResNet18 on CIFAR-10. For VGG16 trained on Tiny ImageNet, our approach requires 5.8times fewer FLOPs and up to 83.3% fewer model parameters with a drop in top-5 (top-1) accuracy of only 1.2% (2.1%). We also compared the performance of our proposed architectures with that of ShuffleNet andMobileNetV2. Using similar hyperparameters and FLOPs, our ResNet18 variants yield an average accuracy improvement of 2.8%. |
Tasks | |
Published | 2020-01-29 |
URL | https://arxiv.org/abs/2001.10710v2 |
https://arxiv.org/pdf/2001.10710v2.pdf | |
PWC | https://paperswithcode.com/paper/pre-defined-sparsity-for-low-complexity |
Repo | |
Framework | |
Learning Reinforced Agents with Counterfactual Simulation for Medical Automatic Diagnosis
Title | Learning Reinforced Agents with Counterfactual Simulation for Medical Automatic Diagnosis |
Authors | Junfan Lin, Ziliang Chen, Xiaodan Liang, Keze Wang, Liang Lin |
Abstract | Medical automatic diagnosis (MAD) aims to learn an agent that mimics the behavior of a human doctor, i.e. inquiring symptoms and informing diseases. Due to medical ethics concerns, it is impractical to directly apply reinforcement learning techniques to solving MAD, e.g., training a reinforced agent with the human patient. Developing a patient simulator by using the collected patient-doctor dialogue records has been proposed as a promising approach to MAD. However, most of these existing works overlook the causal relationship between patient symptoms and disease diagnoses. For example, these simulators simply generate the ``not-sure’’ response to the inquiry (i.e., symptom) that was not observed in one dialogue record. As a result, the MAD agent is usually trained without exploiting the counterfactual reasoning beyond the factual observations. To address this problem, this paper presents a propensity-based patient simulator (PBPS), which is capable of facilitating the training of MAD agents by generating informative counterfactual answers along with the disease diagnosis. Specifically, our PBPS estimates the propensity score of each record with the patient-doctor dialogue reasoning, and can thus generate the counterfactual answers by searching across records. That is, the unrecorded symptom for one patient can be found in the records of other patients according to the propensity score matching. A progressive assurance agent (P2A) can be thus trained with PBPS, which includes two separate yet cooperative branches accounting for the execution of symptom-inquiry and disease-diagnosis actions, respectively. The disease-diagnosis predicts the confidence of disease and drives the symptom-inquiry in terms of enhancing the confidence, and the two branches are jointly optimized with benefiting from each other. | |
Tasks | |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.06534v1 |
https://arxiv.org/pdf/2003.06534v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-reinforced-agents-with |
Repo | |
Framework | |
Not all domains are equally complex: Adaptive Multi-Domain Learning
Title | Not all domains are equally complex: Adaptive Multi-Domain Learning |
Authors | Ali Senhaji, Jenni Raitoharju, Moncef Gabbouj, Alexandros Iosifidis |
Abstract | Deep learning approaches are highly specialized and require training separate models for different tasks. Multi-domain learning looks at ways to learn a multitude of different tasks, each coming from a different domain, at once. The most common approach in multi-domain learning is to form a domain agnostic model, the parameters of which are shared among all domains, and learn a small number of extra domain-specific parameters for each individual new domain. However, different domains come with different levels of difficulty; parameterizing the models of all domains using an augmented version of the domain agnostic model leads to unnecessarily inefficient solutions, especially for easy to solve tasks. We propose an adaptive parameterization approach to deep neural networks for multi-domain learning. The proposed approach performs on par with the original approach while reducing by far the number of parameters, leading to efficient multi-domain learning solutions. |
Tasks | |
Published | 2020-03-25 |
URL | https://arxiv.org/abs/2003.11504v1 |
https://arxiv.org/pdf/2003.11504v1.pdf | |
PWC | https://paperswithcode.com/paper/not-all-domains-are-equally-complex-adaptive |
Repo | |
Framework | |
Distributional Robustness and Regularization in Reinforcement Learning
Title | Distributional Robustness and Regularization in Reinforcement Learning |
Authors | Esther Derman, Shie Mannor |
Abstract | Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO’s extension to sequential decision-making overcomes $\textit{external uncertainty}$ through the robust Markov Decision Process (MDP) setting, the resulting formulation is hard to solve, especially on large domains. On the other hand, existing regularization methods in reinforcement learning only address $\textit{internal uncertainty}$ due to stochasticity. Our study aims to facilitate robust reinforcement learning by establishing a dual relation between robust MDPs and regularization. We introduce Wasserstein distributionally robust MDPs and prove that they hold out-of-sample performance guarantees. Then, we introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. We extend the result to linear value function approximation for large state spaces. Our approach provides an alternative formulation of robustness with guaranteed finite-sample performance. Moreover, it suggests using regularization as a practical tool for dealing with $\textit{external uncertainty}$ in reinforcement learning methods. |
Tasks | Decision Making |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02894v1 |
https://arxiv.org/pdf/2003.02894v1.pdf | |
PWC | https://paperswithcode.com/paper/distributional-robustness-and-regularization |
Repo | |
Framework | |
On implicit regularization: Morse functions and applications to matrix factorization
Title | On implicit regularization: Morse functions and applications to matrix factorization |
Authors | Mohamed Ali Belabbas |
Abstract | In this paper, we revisit implicit regularization from the ground up using notions from dynamical systems and invariant subspaces of Morse functions. The key contributions are a new criterion for implicit regularization—a leading contender to explain the generalization power of deep models such as neural networks—and a general blueprint to study it. We apply these techniques to settle a conjecture on implicit regularization in matrix factorization. |
Tasks | |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04264v2 |
https://arxiv.org/pdf/2001.04264v2.pdf | |
PWC | https://paperswithcode.com/paper/on-implicit-regularization-morse-functions |
Repo | |
Framework | |
Interpolated Adjoint Method for Neural ODEs
Title | Interpolated Adjoint Method for Neural ODEs |
Authors | Talgat Daulbaev, Alexandr Katrutsa, Larisa Markeeva, Julia Gusak, Andrzej Cichocki, Ivan Oseledets |
Abstract | In this paper, we propose a method, which allows us to alleviate or completely avoid the notorious problem of numerical instability and stiffness of the adjoint method for training neural ODE. On the backward pass, we propose to use the machinery of smooth function interpolation to restore the trajectory obtained during the forward integration. We show the viability of our approach, both in theory and practice. |
Tasks | |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05271v1 |
https://arxiv.org/pdf/2003.05271v1.pdf | |
PWC | https://paperswithcode.com/paper/interpolated-adjoint-method-for-neural-odes |
Repo | |
Framework | |
Improving S&P stock prediction with time series stock similarity
Title | Improving S&P stock prediction with time series stock similarity |
Authors | Lior Sidi |
Abstract | Stock market prediction with forecasting algorithms is a popular topic these days where most of the forecasting algorithms train only on data collected on a particular stock. In this paper, we enriched the stock data with related stocks just as a professional trader would have done to improve the stock prediction models. We tested five different similarities functions and found co-integration similarity to have the best improvement on the prediction model. We evaluate the models on seven S&P stocks from various industries over five years period. The prediction model we trained on similar stocks had significantly better results with 0.55 mean accuracy, and 19.782 profit compare to the state of the art model with an accuracy of 0.52 and profit of 6.6. |
Tasks | Stock Market Prediction, Stock Prediction, Time Series |
Published | 2020-02-08 |
URL | https://arxiv.org/abs/2002.05784v1 |
https://arxiv.org/pdf/2002.05784v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-sp-stock-prediction-with-time |
Repo | |
Framework | |
OmniTact: A Multi-Directional High Resolution Touch Sensor
Title | OmniTact: A Multi-Directional High Resolution Touch Sensor |
Authors | Akhil Padmanabha, Frederik Ebert, Stephen Tian, Roberto Calandra, Chelsea Finn, Sergey Levine |
Abstract | Incorporating touch as a sensing modality for robots can enable finer and more robust manipulation skills. Existing tactile sensors are either flat, have small sensitive fields or only provide low-resolution signals. In this paper, we introduce OmniTact, a multi-directional high-resolution tactile sensor. OmniTact is designed to be used as a fingertip for robotic manipulation with robotic hands, and uses multiple micro-cameras to detect multi-directional deformations of a gel-based skin. This provides a rich signal from which a variety of different contact state variables can be inferred using modern image processing and computer vision methods. We evaluate the capabilities of OmniTact on a challenging robotic control task that requires inserting an electrical connector into an outlet, as well as a state estimation problem that is representative of those typically encountered in dexterous robotic manipulation, where the goal is to infer the angle of contact of a curved finger pressing against an object. Both tasks are performed using only touch sensing and deep convolutional neural networks to process images from the sensor’s cameras. We compare with a state-of-the-art tactile sensor that is only sensitive on one side, as well as a state-of-the-art multi-directional tactile sensor, and find that OmniTact’s combination of high-resolution and multi-directional sensing is crucial for reliably inserting the electrical connector and allows for higher accuracy in the state estimation task. Videos and supplementary material can be found at https://sites.google.com/berkeley.edu/omnitact |
Tasks | |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.06965v1 |
https://arxiv.org/pdf/2003.06965v1.pdf | |
PWC | https://paperswithcode.com/paper/omnitact-a-multi-directional-high-resolution |
Repo | |
Framework | |
Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT
Title | Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT |
Authors | Lichao Sun, Kazuma Hashimoto, Wenpeng Yin, Akari Asai, Jia Li, Philip Yu, Caiming Xiong |
Abstract | There is an increasing amount of literature that claims the brittleness of deep neural networks in dealing with adversarial examples that are created maliciously. It is unclear, however, how the models will perform in realistic scenarios where \textit{natural rather than malicious} adversarial instances often exist. This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data, particularly mistakes in typing the keyboard, that occur inadvertently. Intensive experiments on sentiment analysis and question answering benchmarks indicate that: (i) Typos in various words of a sentence do not influence equally. The typos in informative words make severer damages; (ii) Mistype is the most damaging factor, compared with inserting, deleting, etc.; (iii) Humans and machines have different focuses on recognizing adversarial attacks. |
Tasks | Question Answering, Sentiment Analysis |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2003.04985v1 |
https://arxiv.org/pdf/2003.04985v1.pdf | |
PWC | https://paperswithcode.com/paper/adv-bert-bert-is-not-robust-on-misspellings |
Repo | |
Framework | |