Paper Group NANR 4
Amharic Text Normalization with Sequence-to-Sequence Models. Counterfactual Regularization for Model-Based Reinforcement Learning. Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks. Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning. Reflection-based Word Attribute Transfer. Consistent Meta-Reinforceme …
Amharic Text Normalization with Sequence-to-Sequence Models
Title | Amharic Text Normalization with Sequence-to-Sequence Models |
Authors | Anonymous |
Abstract | All areas of language and speech technology, directly or indirectly, require handling of real text. In addition to ordinary words and names, the real text contains non-standard words (NSWs), including numbers, abbreviations, dates, currency, amounts, and acronyms. Typically, one cannot find NSWs in a dictionary, nor can one find their pronunciation by an application of ordinary letter-to-sound rules. It is desirable to normalize text by replacing such non-standard words with a consistently formatted and contextually appropriate variant in several NLP applications. To address this challenge, in this paper, we model the problem as character-level sequence-to-sequence learning where we map a sequence of input characters to a sequence of output words. It consists of two neural networks, the encoder network, and the decoder network. The encoder maps the input characters to a fixed dimensional vector and the decoder generates the output words. We have achieved an accuracy of 94.8 % which is promising given the resource we use. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJe-HkBKDS |
https://openreview.net/pdf?id=SJe-HkBKDS | |
PWC | https://paperswithcode.com/paper/amharic-text-normalization-with-sequence-to |
Repo | |
Framework | |
Counterfactual Regularization for Model-Based Reinforcement Learning
Title | Counterfactual Regularization for Model-Based Reinforcement Learning |
Authors | Anonymous |
Abstract | In sequential tasks, planning-based agents have a number of advantages over model-free agents, including sample efficiency and interpretability. Recurrent action-conditional latent dynamics models trained from pixel-level observations have been shown to predict future observations conditioned on agent actions accurately enough for planning in some pixel-based control tasks. Typically, models of this type are trained to reconstruct sequences of ground-truth observations, given ground-truth actions. However, an action-conditional model can take input actions and states other than the ground truth, to generate predictions of unobserved counterfactual states. Because counterfactual state predictions are generated by differentiable networks, relationships among counterfactual states can be included in a training objective. We explore the possibilities of counterfactual regularization terms applicable during training of action-conditional sequence models. We evaluate their effect on pixel-level prediction accuracy and model-based agent performance, and we show that counterfactual regularization improves the performance of model-based agents in test-time environments that differ from training. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJlk71rYvH |
https://openreview.net/pdf?id=rJlk71rYvH | |
PWC | https://paperswithcode.com/paper/counterfactual-regularization-for-model-based |
Repo | |
Framework | |
Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks
Title | Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks |
Authors | Anonymous |
Abstract | It has been widely recognized that adversarial examples can be easily crafted to fool deep networks, which mainly root from the locally non-linear behavior nearby input examples. Applying mixup in training provides an effective mechanism to improve generalization performance and model robustness against adversarial perturbations, which introduces the globally linear behavior in-between training examples. However, in previous work, the mixup-trained models only passively defend adversarial attacks in inference by directly classifying the inputs, where the induced global linearity is not well exploited. Namely, since the locality of the adversarial perturbations, it would be more efficient to actively break the locality via the globality of the model predictions. Inspired by simple geometric intuition, we develop an inference principle, named mixup inference (MI), for mixup-trained models. MI mixups the input with other random clean samples, which can shrink and transfer the equivalent perturbation if the input is adversarial. Our experiments on CIFAR-10 and CIFAR-100 demonstrate that MI can further improve the adversarial robustness for the models trained by mixup and its variants. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByxtC2VtPB |
https://openreview.net/pdf?id=ByxtC2VtPB | |
PWC | https://paperswithcode.com/paper/mixup-inference-better-exploiting-mixup-to-1 |
Repo | |
Framework | |
Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning
Title | Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning |
Authors | Anonymous |
Abstract | Uncertainty estimation and ensembling methods go hand-in-hand. Uncertainty estimation is one of the main benchmarks for assessment of ensembling performance. At the same time, deep learning ensembles have provided state-of-the-art results in uncertainty estimation. In this work, we focus on in-domain uncertainty for image classification. We explore the standards for its quantification and point out pitfalls of existing metrics. Avoiding these pitfalls, we perform a broad study of different ensembling techniques. To provide more insight in the broad comparison, we introduce the deep ensemble equivalent (DEE) and show that many sophisticated ensembling techniques are equivalent to an ensemble of very few independently trained networks in terms of the test log-likelihood. |
Tasks | Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxI5gHKDr |
https://openreview.net/pdf?id=BJxI5gHKDr | |
PWC | https://paperswithcode.com/paper/pitfalls-of-in-domain-uncertainty-estimation |
Repo | |
Framework | |
Reflection-based Word Attribute Transfer
Title | Reflection-based Word Attribute Transfer |
Authors | Anonymous |
Abstract | We propose a word attribute transfer framework based on reflection to obtain a word vector with an inverted target attribute for a given word in a word embedding space. Word embeddings based on Pointwise Mutual Information (PMI) represent such analogic relations as king - man + woman \approx queen. These relations can be used for changing a word’s attribute from king to queen by changing its gender. This attribute transfer can be performed by subtracting a difference vector man - woman from king when we have explicit knowledge of the gender of given word king. However, this knowledge cannot be developed for various words and attributes in practice. For transferring queen into king in this analogy-based manner, we need to know that queen denotes a female and add the difference vector to it. In this work, we transfer such binary attributes based on an assumption that such transfer mapping will become identity mapping when we apply it twice. We introduce a framework based on reflection mapping that satisfies this property; queen should be transferred back to king with the same mapping as the transfer from king to queen. Experimental results show that the proposed method can transfer the word attributes of the given words, and does not change the words that do not have the target attributes. |
Tasks | Word Attribute Transfer, Word Embeddings |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyxoX6EKvB |
https://openreview.net/pdf?id=HyxoX6EKvB | |
PWC | https://paperswithcode.com/paper/reflection-based-word-attribute-transfer |
Repo | |
Framework | |
Consistent Meta-Reinforcement Learning via Model Identification and Experience Relabeling
Title | Consistent Meta-Reinforcement Learning via Model Identification and Experience Relabeling |
Authors | Anonymous |
Abstract | Reinforcement learning algorithms can acquire policies for complex tasks automatically, however the number of samples required to learn a diverse set of skills can be prohibitively large. While meta-reinforcement learning has enabled agents to leverage prior experience to adapt quickly to new tasks, the performance of these methods depends crucially on how close the new task is to the previously experienced tasks. Current approaches are either not able to extrapolate well, or can do so at the expense of requiring extremely large amounts of data due to on-policy training. In this work, we present model identification and experience relabeling (MIER), a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data, even if policies and value functions cannot. These dynamics models can then be used to continue training policies for out-of-distribution tasks without using meta-reinforcement learning at all, by generating synthetic experience for the new task. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SygSLlStwS |
https://openreview.net/pdf?id=SygSLlStwS | |
PWC | https://paperswithcode.com/paper/consistent-meta-reinforcement-learning-via |
Repo | |
Framework | |
Geometry-aware Generation of Adversarial and Cooperative Point Clouds
Title | Geometry-aware Generation of Adversarial and Cooperative Point Clouds |
Authors | Anonymous |
Abstract | Recent studies show that machine learning models are vulnerable to adversarial examples. In 2D image domain, these examples are obtained by adding imperceptible noises to natural images. This paper studies adversarial generation of point clouds by learning to deform those approximating object surfaces of certain categories. As 2D manifolds embedded in the 3D Euclidean space, object surfaces enjoy the general properties of smoothness and fairness. We thus argue that in order to achieve imperceptible surface shape deformations, adversarial point clouds should have the same properties with similar degrees of smoothness/fairness to the benign ones, while being close to the benign ones as well when measured under certain distance metrics of point clouds. To this end, we propose a novel loss function to account for imperceptible, geometry-aware deformations of point clouds, and use the proposed loss in an adversarial objective to attack representative models of point set classifiers. Experiments show that our proposed method achieves stronger attacks than existing methods, without introduction of noticeable outliers and surface irregularities. In this work, we also investigate an opposite direction that learns to deform point clouds of object surfaces in the same geometry-aware, but cooperative manner. Cooperatively generated point clouds are more favored by machine learning models in terms of improved classification confidence or accuracy. We present experiments verifying that our proposed objective succeeds in learning cooperative shape deformations. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bklr0kBKvB |
https://openreview.net/pdf?id=Bklr0kBKvB | |
PWC | https://paperswithcode.com/paper/geometry-aware-generation-of-adversarial-and |
Repo | |
Framework | |
Unsupervised Learning of Automotive 3D Crash Simulations using LSTMs
Title | Unsupervised Learning of Automotive 3D Crash Simulations using LSTMs |
Authors | Amin Abbasloo, Jochen Garcke, Rodrigo Iza-Teran |
Abstract | Long short-term memory (LSTM) networks allow to exhibit temporal dynamic behavior with feedback connections and seem a natural choice for learning sequences of 3D meshes. We introduce an approach for dynamic mesh representations as used for numerical simulations of car crashes. To bypass the complication of using 3D meshes, we transform the surface mesh sequences into spectral descriptors that efficiently encode the shape. A two branch LSTM based network architecture is chosen to learn the representations and dynamics of the crash during the simulation. The architecture is based on unsupervised video prediction by an LSTM without any convolutional layer. It uses an encoder LSTM to map an input sequence into a fixed length vector representation. On this representation one decoder LSTM performs the reconstruction of the input sequence, while the other decoder LSTM predicts the future behavior by receiving initial steps of the sequence as seed. The spatio-temporal error behavior of the model is analysed to study how well the model can extrapolate the learned spectral descriptors into the future, that is, how well it has learned to represent the underlying dynamical structural mechanics. Considering that only a few training examples are available, which is the typical case for numerical simulations, the network performs very well. |
Tasks | Video Prediction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BklekANtwr |
https://openreview.net/pdf?id=BklekANtwr | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-automotive-3d-crash |
Repo | |
Framework | |
Accelerate DNN Inference By Inter-Operator Parallelization
Title | Accelerate DNN Inference By Inter-Operator Parallelization |
Authors | Anonymous |
Abstract | High utilization is key to achieve high efficiency for deep neural networks. Existing deep learning frameworks has focused on improving the performance of individual operators but ignored the parallelization between operators. This leads to low device utilization especially for complex deep neural networks (DNNs) with many small operations such as Inception and NASNet. To make complex DNNs more efficient, we need to execute parallely. However, naive greedy schedule leads to much resource contention and do not yield best performance. In this work, we propose Deep Optimal Scheduling (DOS), a general dynamic programming algorithm to find optimal scheduling to improve utilization via parallel execution. Specifically, DOS optimizes the execution for given hardware and inference settings. Our experiments demonstrate that DOS consistently outperform existing deep learning library by 1.2 to 1.4 × on widely used complex DNNs. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJezqlrKvr |
https://openreview.net/pdf?id=HJezqlrKvr | |
PWC | https://paperswithcode.com/paper/accelerate-dnn-inference-by-inter-operator |
Repo | |
Framework | |
End-to-end learning of energy-based representations for irregularly-sampled signals and images
Title | End-to-end learning of energy-based representations for irregularly-sampled signals and images |
Authors | Anonymous |
Abstract | For numerous domains, including for instance earth observation, medical imaging, astrophysics,…, available image and signal datasets often irregular space-time sampling patterns and large missing data rates. These sampling properties is a critical issue to apply state-of-the-art learning-based (e.g., auto-encoders, CNNs,…) to fully benefit from the available large-scale observations and reach breakthroughs in the reconstruction and identification of processes of interest. In this paper, we address the end-to-end learning of representations of signals, images and image sequences from irregularly-sampled data, {\em i.e.} when the training data involved missing data. From an analogy to Bayesian formulation, we consider energy-based representations. Two energy forms are investigated: one derived from auto-encoders and one relating to Gibbs energies. The learning stage of these energy-based representations (or priors) involve a joint interpolation issue, which resorts to solving an energy minimization problem under observation constraints. Using a neural-network-based implementation of the considered energy forms, we can state an end-to-end learning scheme from irregularly-sampled data. We demonstrate the relevance of the proposed representations for different case-studies: namely, multivariate time series, 2{\sc } images and image sequences. |
Tasks | Time Series |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJe8pxSFwr |
https://openreview.net/pdf?id=rJe8pxSFwr | |
PWC | https://paperswithcode.com/paper/end-to-end-learning-of-energy-based |
Repo | |
Framework | |
Beyond GANs: Transforming without a Target Distribution
Title | Beyond GANs: Transforming without a Target Distribution |
Authors | Anonymous |
Abstract | While generative neural networks can learn to transform a specific input dataset into a specific target dataset, they require having just such a paired set of input/output datasets. For instance, to fool the discriminator, a generative adversarial network (GAN) exclusively trained to transform images of black-haired men to blond-haired men would need to change gender-related characteristics as well as hair color when given images of black-haired women as input. This is problematic, as often it is possible to obtain a pair of (source, target) distributions but then have a second source distribution where the target distribution is unknown. The computational challenge is that generative models are good at generation within the manifold of the data that they are trained on. However, generating new samples outside of the manifold or extrapolating “out-of-sample” is a much harder problem that has been less well studied. To address this, we introduce a technique called neuron editing that learns how neurons encode an edit for a particular transformation in a latent space. We use an autoencoder to decompose the variation within the dataset into activations of different neurons and generate transformed data by defining an editing transformation on those neurons. By performing the transformation in a latent trained space, we encode fairly complex and non-linear transformations to the data with much simpler distribution shifts to the neuron’s activations. Our technique is general and works on a wide variety of data domains and applications. We first demonstrate it on image transformations and then move to our two main biological applications: removal of batch artifacts representing unwanted noise and modeling the effect of drug treatments to predict synergy between drugs. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1lDSCEYPH |
https://openreview.net/pdf?id=H1lDSCEYPH | |
PWC | https://paperswithcode.com/paper/beyond-gans-transforming-without-a-target |
Repo | |
Framework | |
Multi-Step Decentralized Domain Adaptation
Title | Multi-Step Decentralized Domain Adaptation |
Authors | Anonymous |
Abstract | Despite the recent breakthroughs in unsupervised domain adaptation (uDA), no prior work has studied the challenges of applying these methods in practical machine learning scenarios. In this paper, we highlight two significant bottlenecks for uDA, namely excessive centralization and poor support for distributed domain datasets. Our proposed framework, MDDA, is powered by a novel collaborator selection algorithm and an effective distributed adversarial training method, and allows for uDA methods to work in a decentralized and privacy-preserving way. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hkg0olStDr |
https://openreview.net/pdf?id=Hkg0olStDr | |
PWC | https://paperswithcode.com/paper/multi-step-decentralized-domain-adaptation |
Repo | |
Framework | |
Extracting and Leveraging Feature Interaction Interpretations
Title | Extracting and Leveraging Feature Interaction Interpretations |
Authors | Anonymous |
Abstract | Recommendation is a prevalent application of machine learning that affects many users; therefore, it is crucial for recommender models to be accurate and interpretable. In this work, we propose a method to both interpret and augment the predictions of black-box recommender systems. In particular, we propose to extract feature interaction interpretations from a source recommender model and explicitly encode these interactions in a target recommender model, where both source and target models are black-boxes. By not assuming the structure of the recommender system, our approach can be used in general settings. In our experiments, we focus on a prominent use of machine learning recommendation: ad-click prediction. We found that our interaction interpretations are both informative and predictive, i.e., significantly outperforming existing recommender models. What’s more, the same approach to interpreting interactions can provide new insights into domains even beyond recommendation. |
Tasks | Recommendation Systems |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkgnhTEtDS |
https://openreview.net/pdf?id=BkgnhTEtDS | |
PWC | https://paperswithcode.com/paper/extracting-and-leveraging-feature-interaction |
Repo | |
Framework | |
On Incorporating Semantic Prior Knowlegde in Deep Learning Through Embedding-Space Constraints
Title | On Incorporating Semantic Prior Knowlegde in Deep Learning Through Embedding-Space Constraints |
Authors | Anonymous |
Abstract | The knowledge that humans hold about a problem often extends far beyond a set of training data and output labels. While the success of deep learning mostly relies on supervised training, important properties cannot be inferred efficiently from end-to-end annotations alone, for example causal relations or domain-specific invariances. We present a general technique to supplement supervised training with prior knowledge expressed as relations between training instances. We illustrate the method on the task of visual question answering to exploit various auxiliary annotations, including relations of equivalence and of logical entailment between questions. Existing methods to use these annotations, including auxiliary losses and data augmentation, cannot guarantee the strict inclusion of these relations into the model since they require a careful balancing against the end-to-end objective. Our method uses these relations to shape the embedding space of the model, and treats them as strict constraints on its learned representations. %The resulting model encodes relations that better generalize across instances. In the context of VQA, this approach brings significant improvements in accuracy and robustness, in particular over the common practice of incorporating the constraints as a soft regularizer. We also show that incorporating this type of prior knowledge with our method brings consistent improvements, independently from the amount of supervised data used. It demonstrates the value of an additional training signal that is otherwise difficult to extract from end-to-end annotations alone. |
Tasks | Data Augmentation, Question Answering, Visual Question Answering |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1ggKyrYwB |
https://openreview.net/pdf?id=H1ggKyrYwB | |
PWC | https://paperswithcode.com/paper/on-incorporating-semantic-prior-knowlegde-in |
Repo | |
Framework | |
Learning Curves for Deep Neural Networks: A field theory perspective
Title | Learning Curves for Deep Neural Networks: A field theory perspective |
Authors | Anonymous |
Abstract | A series of recent works established a rigorous correspondence between very wide deep neural networks (DNNs), trained in a particular manner, and noiseless Bayesian Inference with a certain Gaussian Process (GP) known as the Neural Tangent Kernel (NTK). Here we extend a known field-theory formalism for GP inference to get a detailed understanding of learning-curves in DNNs trained in the regime of this correspondence (NTK regime). In particular, a renormalization-group approach is used to show that noiseless GP inference using NTK, which lacks a good analytical handle, can be well approximated by noisy GP inference on a related kernel we call the renormalized NTK. Following this, a perturbation-theory analysis is carried in one over the dataset-size yielding analytical expressions for the (fixed-teacher/fixed-target) leading and sub-leading asymptotics of the learning curves. At least for uniform datasets, a coherent picture emerges wherein fully-connected DNNs have a strong implicit bias towards functions which are low order polynomials of the input. |
Tasks | Bayesian Inference |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SklwGlHFvH |
https://openreview.net/pdf?id=SklwGlHFvH | |
PWC | https://paperswithcode.com/paper/learning-curves-for-deep-neural-networks-a-1 |
Repo | |
Framework | |