Paper Group NANR 53
Machine Truth Serum. Adversarial training with perturbation generator networks. Conditional Invertible Neural Networks for Guided Image Generation. To Relieve Your Headache of Training an MRF, Take AdVIL. Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension. Incorporating BERT into Neura …
Machine Truth Serum
Title | Machine Truth Serum |
Authors | Anonymous |
Abstract | Wisdom of the crowd revealed a striking fact that the majority answer from a crowd is often more accurate than any individual expert. We observed the same story in machine learning - ensemble methods leverage this idea to combine multiple learning algorithms to obtain better classification performance. Among many popular examples is the celebrated Random Forest, which applies the majority voting rule in aggregating different decision trees to make the final prediction. Nonetheless, these aggregation rules would fail when the majority is more likely to be wrong. In this paper, we extend the idea proposed in Bayesian Truth Serum that “a surprisingly more popular answer is more likely the true answer” to classification problems. The challenge for us is to define or detect when an answer should be considered as being “surprising”. We present two machine learning aided methods which aim to reveal the truth when it is minority instead of majority who has the true answer. Our experiments over real-world datasets show that better classification performance can be obtained compared to always trusting the majority voting. Our proposed methods also outperform popular ensemble algorithms. Our approach can be generically applied as a subroutine in ensemble methods to replace majority voting rule. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1eQuCVFvB |
https://openreview.net/pdf?id=S1eQuCVFvB | |
PWC | https://paperswithcode.com/paper/machine-truth-serum-1 |
Repo | |
Framework | |
Adversarial training with perturbation generator networks
Title | Adversarial training with perturbation generator networks |
Authors | Anonymous |
Abstract | Despite the remarkable development of recent deep learning techniques, neural networks are still vulnerable to adversarial attacks, i.e., methods that fool the neural networks with perturbations that are too small for human eyes to perceive. Many adversarial training methods were introduced as to solve this problem, using adversarial examples as a training data. However, these adversarial attack methods used in these techniques are fixed, making the model stronger only to attacks used in training, which is widely known as an overfitting problem. In this paper, we suggest a novel adversarial training approach. In addition to the classifier, our method adds another neural network that generates the most effective adversarial perturbation by finding the weakness of the classifier. This perturbation generator network is trained to produce perturbations that maximize the loss function of the classifier, and these adversarial examples train the classifier with a true label. In short, the two networks compete with each other, performing a minimax game. In this scenario, attack patterns created by the generator network are adaptively altered to the classifier, mitigating the overfitting problem mentioned above. We theoretically proved that our minimax optimization problem is equivalent to minimizing the adversarial loss after all. Beyond this, we proposed an evaluation method that could accurately compare a wide-range of adversarial algorithms. Experiments with various datasets show that our method outperforms conventional adversarial algorithms. |
Tasks | Adversarial Attack |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1xXiREKDB |
https://openreview.net/pdf?id=S1xXiREKDB | |
PWC | https://paperswithcode.com/paper/adversarial-training-with-perturbation |
Repo | |
Framework | |
Conditional Invertible Neural Networks for Guided Image Generation
Title | Conditional Invertible Neural Networks for Guided Image Generation |
Authors | Anonymous |
Abstract | In this work, we address the task of natural image generation guided by a conditioning input. We introduce a new architecture called conditional invertible neural network (cINN). It combines the purely generative INN model with an unconstrained feed-forward network, which efficiently pre-processes the conditioning input into useful features. All parameters of a cINN are jointly optimized with a stable, maximum likelihood-based training procedure. Even though INNs and other normalizing flow models have received very little attention in the literature in contrast to GANs, we find that cINNs can achieve comparable quality, with some remarkable properties absent in cGANs, e.g. apparent immunity to mode collapse. We demonstrate these properties for the tasks of MNIST digit generation and image colorization. Furthermore, we take advantage of our bidirectional cINN architecture to explore and manipulate emergent properties of the latent space, such as changing the image style in an intuitive way. |
Tasks | Colorization, Image Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyxC9TEtPH |
https://openreview.net/pdf?id=SyxC9TEtPH | |
PWC | https://paperswithcode.com/paper/conditional-invertible-neural-networks-for |
Repo | |
Framework | |
To Relieve Your Headache of Training an MRF, Take AdVIL
Title | To Relieve Your Headache of Training an MRF, Take AdVIL |
Authors | Anonymous |
Abstract | We propose a black-box algorithm called {\it Adversarial Variational Inference and Learning} (AdVIL) to perform inference and learning on a general Markov random field (MRF). AdVIL employs two variational distributions to approximately infer the latent variables and estimate the partition function of an MRF, respectively. The two variational distributions provide an estimate of the negative log-likelihood of the MRF as a minimax optimization problem, which is solved by stochastic gradient descent. AdVIL is proven convergent under certain conditions. On one hand, compared with contrastive divergence, AdVIL requires a minimal assumption about the model structure and can deal with a broader family of MRFs. On the other hand, compared with existing black-box methods, AdVIL provides a tighter estimate of the log partition function and achieves much better empirical results. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Sylgsn4Fvr |
https://openreview.net/pdf?id=Sylgsn4Fvr | |
PWC | https://paperswithcode.com/paper/to-relieve-your-headache-of-training-an-mrf |
Repo | |
Framework | |
Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension
Title | Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension |
Authors | Anonymous |
Abstract | Integrating distributed representations with symbolic operations is essential for reading comprehension requiring complex reasoning, such as counting, sorting and arithmetics, but most existing approaches are hard to scale to more domains or more complex reasoning. In this work, we propose the Neural Symbolic Reader (NeRd), which includes a reader, e.g., BERT, to encode the passage and question, and a programmer, e.g., LSTM, to generate a program that is executed to produce the answer. Compared to previous works, NeRd is more scalable in two aspects: (1) domain-agnostic, i.e., the same neural architecture works for different domains; (2) compositional, i.e., when needed, complex programs can be generated by recursively applying the predefined operators, which become executable and interpretable representations for more complex reasoning. Furthermore, to overcome the challenge of training NeRd with weak supervision, we apply data augmentation techniques and hard Expectation-Maximization (EM) with thresholding. On DROP, a challenging reading comprehension dataset that requires discrete reasoning, NeRd achieves 2.5%/1.8% absolute improvement over the state-of-the-art on EM/F1 metrics. With the same architecture, NeRd significantly outperforms the baselines on MathQA, a math problem benchmark that requires multiple steps of reasoning, by 25.5% absolute increment on accuracy when trained on all the annotated programs. More importantly, NeRd still beats the baselines even when only 20% of the program annotations are given. |
Tasks | Data Augmentation, Reading Comprehension |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxjnREFwH |
https://openreview.net/pdf?id=ryxjnREFwH | |
PWC | https://paperswithcode.com/paper/neural-symbolic-reader-scalable-integration |
Repo | |
Framework | |
Incorporating BERT into Neural Machine Translation
Title | Incorporating BERT into Neural Machine Translation |
Authors | Anonymous |
Abstract | The recently proposed BERT~\citep{devlin2018bert} has shown great power on a variety of natural language understanding tasks, such as text classification, reading comprehension, etc. However, how to effectively apply BERT to neural machine translation (NMT) lacks enough exploration. While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning. This motivates us to think how to better leverage BERT for NMT along this direction. We propose a new algorithm named BERT-fused NMT, in which we first use BERT to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms. We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets. Our code is available at an anonymous Github page \url{https://github.com/bert-nmt/bert-nmt}. |
Tasks | Machine Translation, Reading Comprehension, Text Classification, Unsupervised Machine Translation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hyl7ygStwB |
https://openreview.net/pdf?id=Hyl7ygStwB | |
PWC | https://paperswithcode.com/paper/incorporating-bert-into-neural-machine |
Repo | |
Framework | |
Option Discovery using Deep Skill Chaining
Title | Option Discovery using Deep Skill Chaining |
Authors | Anonymous |
Abstract | Autonomously discovering temporally extended actions, or skills, is a longstanding goal of hierarchical reinforcement learning. We propose a new algorithm that combines skill chaining with deep neural networks to autonomously discover skills in high-dimensional, continuous domains. The resulting algorithm, deep skill chaining, constructs skills with the property that executing one enables the agent to execute another. We demonstrate that deep skill chaining significantly outperforms both non-hierarchical agents and other state-of-the-art skill discovery techniques in challenging continuous control tasks. |
Tasks | Continuous Control, Hierarchical Reinforcement Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1gqipNYwH |
https://openreview.net/pdf?id=B1gqipNYwH | |
PWC | https://paperswithcode.com/paper/option-discovery-using-deep-skill-chaining |
Repo | |
Framework | |
Temporal Probabilistic Asymmetric Multi-task Learning
Title | Temporal Probabilistic Asymmetric Multi-task Learning |
Authors | Anonymous |
Abstract | When performing multi-task predictions with time-series data, knowledge learned for one task at a specific time step may be useful in learning for another task at a later time step (e.g. prediction of sepsis may be useful for prediction of mortality for risk prediction at intensive care units). To capture such dynamically changing asymmetric relationships between tasks and long-range temporal dependencies in time-series data, we propose a novel temporal asymmetric multi-task learning model, which learns to combine features from other tasks at diverse timesteps for the prediction of each task. One crucial challenge here is deciding on the direction and the amount of knowledge transfer, since loss-based knowledge transfer Lee et al. (2016; 2017) does not apply in our case where we do not have loss at each timestep. We propose to tackle this challenge by proposing a novel uncertainty- based probabilistic knowledge transfer mechanism, such that we perform knowledge transfer from more certain tasks with lower variance to uncertain ones with higher variance. We validate our Temporal Probabilistic Asymmetric Multi-task Learning (TP-AMTL) model on two clinical risk prediction tasks against recent deep learning models for time-series analysis, which our model significantly outperforms by successfully preventing negative transfer. Further qualitative analysis of our model by clinicians suggests that the learned knowledge transfer graphs are helpful in analyzing the model’s predictions. |
Tasks | Multi-Task Learning, Time Series, Time Series Analysis, Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HygN634KvH |
https://openreview.net/pdf?id=HygN634KvH | |
PWC | https://paperswithcode.com/paper/temporal-probabilistic-asymmetric-multi-task |
Repo | |
Framework | |
Best feature performance in codeswitched hate speech texts
Title | Best feature performance in codeswitched hate speech texts |
Authors | Anonymous |
Abstract | How well can hate speech concept be abstracted in order to inform automatic classification in codeswitched texts by machine learning classifiers? We explore different representations and empirically evaluate their predictiveness using both conventional and deep learning algorithms in identifying hate speech in a ~48k human-annotated dataset that contain mixed languages, a phenomenon common among multilingual speakers. This paper espouses a novel approach to handle this challenge by introducing a hierarchical approach that employs Latent Dirichlet Allocation to generate topic models that feed into another high-level feature set that we acronym PDC. PDC groups similar meaning words in word families during the preprocessing stage for supervised learning models. The high-level PDC features generated are based on Ombui et al, (2019) hate speech annotation framework that is informed by the triangular theory of hate (Stanberg,2003). Results obtained from frequency-based models using the PDC feature on the annotated dataset of ~48k short messages comprising of tweets generated during the 2012 and 2017 Kenyan presidential elections indicate an improvement on classification accuracy in identifying hate speech as compared to the baseline |
Tasks | Topic Models |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Skl6peHFwS |
https://openreview.net/pdf?id=Skl6peHFwS | |
PWC | https://paperswithcode.com/paper/best-feature-performance-in-codeswitched-hate |
Repo | |
Framework | |
Mixed Precision DNNs: All you need is a good parametrization
Title | Mixed Precision DNNs: All you need is a good parametrization |
Authors | Anonymous |
Abstract | Efficient deep neural network (DNN) inference on mobile or embedded devices typically involves quantization of the network parameters and activations. In particular, mixed precision networks achieve better performance than networks with homogeneous bitwidth for the same size constraint. Since choosing the optimal bitwidths is not straight forward, training methods, which can learn them, are desirable. Differentiable quantization with straight-through gradients allows to learn the quantizer’s parameters using gradient methods. We show that a suited parametrization of the quantizer is the key to achieve a stable training and a good final performance. Specifically, we propose to parametrize the quantizer with the step size and dynamic range. The bitwidth can then be inferred from them. Other parametrizations, which explicitly use the bitwidth, consistently perform worse. We confirm our findings with experiments on CIFAR-10 and ImageNet and we obtain mixed precision DNNs with learned quantization parameters, achieving state-of-the-art performance. |
Tasks | Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hyx0slrFvH |
https://openreview.net/pdf?id=Hyx0slrFvH | |
PWC | https://paperswithcode.com/paper/mixed-precision-dnns-all-you-need-is-a-good |
Repo | |
Framework | |
Pruned Graph Scattering Transforms
Title | Pruned Graph Scattering Transforms |
Authors | Anonymous |
Abstract | Graph convolutional networks (GCNs) have achieved remarkable performance in a variety of network science learning tasks. However, theoretical analysis of such approaches is still at its infancy. Graph scattering transforms (GSTs) are non-trainable deep GCN models that are amenable to generalization and stability analyses. The present work addresses some limitations of GSTs by introducing a novel so-termed pruned (p)GST approach. The resultant pruning algorithm is guided by a graph-spectrum-inspired criterion, and retains informative scattering features on-the-fly while bypassing the exponential complexity associated with GSTs. It is further established that pGSTs are stable to perturbations of the input graph signals with bounded energy. Experiments showcase that i) pGST performs comparably to the baseline GST that uses all scattering features, while achieving significant computational savings; ii) pGST achieves comparable performance to state-of-the-art GCNs; and iii) Graph data from various domains lead to different scattering patterns, suggesting domain-adaptive pGST network architectures. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJeg7TEYwB |
https://openreview.net/pdf?id=rJeg7TEYwB | |
PWC | https://paperswithcode.com/paper/pruned-graph-scattering-transforms |
Repo | |
Framework | |
Affine Self Convolution
Title | Affine Self Convolution |
Authors | Anonymous |
Abstract | Attention mechanisms, and most prominently self-attention, are a powerful building block for processing not only text but also images. These provide a parameter efficient method for aggregating inputs. We focus on self-attention in vision models, and we combine it with convolution, which as far as we know, are the first to do. What emerges is a convolution with data dependent filters. We call this an Affine Self Convolution. While this is applied differently at each spatial location, we show that it is translation equivariant. We also modify the Squeeze and Excitation variant of attention, extending both variants of attention to the roto-translation group. We evaluate these new models on CIFAR10 and CIFAR100 and show an improvement in the number of parameters, while reaching comparable or higher accuracy at test time against self-trained baselines. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkejpaVFDH |
https://openreview.net/pdf?id=BkejpaVFDH | |
PWC | https://paperswithcode.com/paper/affine-self-convolution |
Repo | |
Framework | |
Continuous Convolutional Neural Network forNonuniform Time Series
Title | Continuous Convolutional Neural Network forNonuniform Time Series |
Authors | Anonymous |
Abstract | Convolutional neural network (CNN) for time series data implicitly assumes that the data are uniformly sampled, whereas many event-based and multi-modal data are nonuniform or have heterogeneous sampling rates. Directly applying regularCNN to nonuniform time series is ungrounded, because it is unable to recognize and extract common patterns from the nonuniform input signals. Converting the nonuniform time series to uniform ones by interpolation preserves the pattern extraction capability of CNN, but the interpolation kernels are often preset and may be unsuitable for the data or tasks. In this paper, we propose the ContinuousCNN (CCNN), which estimates the inherent continuous inputs by interpolation, and performs continuous convolution on the continuous input. The interpolation and convolution kernels are learned in an end-to-end manner, and are able to learn useful patterns despite the nonuniform sampling rate. Besides, CCNN is a strict generalization to CNN. Results of several experiments verify that CCNN achieves abetter performance on nonuniform data, and learns meaningful continuous kernels |
Tasks | Time Series |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1e4MkSFDr |
https://openreview.net/pdf?id=r1e4MkSFDr | |
PWC | https://paperswithcode.com/paper/continuous-convolutional-neural-network |
Repo | |
Framework | |
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Title | VL-BERT: Pre-training of Generic Visual-Linguistic Representations |
Authors | Anonymous |
Abstract | We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input. In it, each element of the input is either of a word from the input sentence, or a region-of-interest (RoI) from the input image. It is designed to fit for most of the visual-linguistic downstream tasks. To better exploit the generic representation, we pre-train VL-BERT on the massive-scale Conceptual Captions dataset, together with text-only corpus. Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and benefit the downstream tasks, such as visual commonsense reasoning, visual question answering and referring expression comprehension. It is worth noting that VL-BERT achieved the first place of single model on the leaderboard of the VCR benchmark. |
Tasks | Question Answering, Visual Commonsense Reasoning, Visual Question Answering |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SygXPaEYvH |
https://openreview.net/pdf?id=SygXPaEYvH | |
PWC | https://paperswithcode.com/paper/vl-bert-pre-training-of-generic-visual-1 |
Repo | |
Framework | |
CURSOR-BASED ADAPTIVE QUANTIZATION FOR DEEP NEURAL NETWORK
Title | CURSOR-BASED ADAPTIVE QUANTIZATION FOR DEEP NEURAL NETWORK |
Authors | Anonymous |
Abstract | Deep neural network (DNN) has rapidly found many applications in different scenarios. However, its large computational cost and memory consumption are barriers to computing restrained applications. DNN model quantization is a widely used method to reduce the DNN storage and computation burden by decreasing the bit width. In this paper, we propose a novel cursor based adaptive quantization method using differentiable architecture search (DAS). The multiple bits’ quantization mechanism is formulated as a DAS process with a continuous cursor that represents the possible quantization bit. The cursor-based DAS adaptively searches for the desired quantization bit for each layer. The DAS process can be solved via an alternative approximate optimization process, which is designed for mixed quantization scheme of a DNN model. We further devise a new loss function in the search process to simultaneously optimize accuracy and parameter size of the model. In the quantization step, based on a new strategy, the closest two integers to the cursor are adopted as the bits to quantize the DNN together to reduce the quantization noise and avoid the local convergence problem. Comprehensive experiments on benchmark datasets show that our cursor based adaptive quantization approach achieves the new state-of-the-art for multiple bits’ quantization and can efficiently obtain lower size model with comparable or even better classification accuracy. |
Tasks | Quantization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1gL3RVtwr |
https://openreview.net/pdf?id=H1gL3RVtwr | |
PWC | https://paperswithcode.com/paper/cursor-based-adaptive-quantization-for-deep |
Repo | |
Framework | |