April 1, 2020

3180 words 15 mins read

Paper Group NANR 75

Exploring the Pareto-Optimality between Quality and Diversity in Text Generation. The Detection of Distributional Discrepancy for Text Generation. Coresets for Accelerating Incremental Gradient Methods. CONTRIBUTION OF INTERNAL REFLECTION IN LANGUAGE EMERGENCE WITH AN UNDER-RESTRICTED SITUATION. Gradient Perturbation is Underrated for Differentiall …

Exploring the Pareto-Optimality between Quality and Diversity in Text Generation


Title	Exploring the Pareto-Optimality between Quality and Diversity in Text Generation
Authors	Anonymous
Abstract	Quality and diversity are two essential aspects for performance evaluation of text generation models. Quality indicates how likely the generated samples are to be real samples, and diversity indicates how much differences there are between generated samples. Though quality and diversity metrics have been widely used for evaluation, it is still not clear what the relationship is between them. In this paper, we give theoretical analysis of a multi-objective programming problem where quality and diversity are both expected to be maximized. We prove that there exists a family of Pareto-optimal solutions, giving an explanation of the widely observed tradeoff behavior between quality and diversity in practice. We also give the structure of such solutions, and show that a linear combination of quality and diversity is sufficient to measure the divergence between the generated distribution and the real distribution. Further, we derive an efficient algorithm to reach the Pareto-optimal solutions in practice, enabling a controllable quality-diversity tradeoff.
Tasks	Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=BygInyBFPr
PDF	https://openreview.net/pdf?id=BygInyBFPr
PWC	https://paperswithcode.com/paper/exploring-the-pareto-optimality-between
Repo
Framework

The Detection of Distributional Discrepancy for Text Generation


Title	The Detection of Distributional Discrepancy for Text Generation
Authors	Anonymous
Abstract	The text generated by neural language models is not as good as the real text. This means that their distributions are different. Generative Adversarial Nets (GAN) are used to alleviate it. However, some researchers argue that GAN variants do not work at all. When both sample quality (such as Bleu) and sample diversity (such as self-Bleu) are taken into account, the GAN variants even are worse than a well-adjusted language model. But, Bleu and self-Bleu can not precisely measure this distributional discrepancy. In fact, how to measure the distributional discrepancy between real text and generated text is still an open problem. In this paper, we theoretically propose two metric functions to measure the distributional difference between real text and generated text. Besides that, a method is put forward to estimate them. First, we evaluate language model with these two functions and find the difference is huge. Then, we try several methods to use the detected discrepancy signal to improve the generator. However the difference becomes even bigger than before. Experimenting on two existing language GANs, the distributional discrepancy between real text and generated text increases with more adversarial learning rounds. It demonstrates both of these language GANs fail.
Tasks	Language Modelling, Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=SylurJHFPS
PDF	https://openreview.net/pdf?id=SylurJHFPS
PWC	https://paperswithcode.com/paper/the-detection-of-distributional-discrepancy
Repo
Framework

Coresets for Accelerating Incremental Gradient Methods


Title	Coresets for Accelerating Incremental Gradient Methods
Authors	Anonymous
Abstract	Many machine learning problems reduce to the problem of minimizing an expected risk. Incremental gradient (IG) methods, such as stochastic gradient descent and its variants, have been successfully used to train the largest of machine learning models. IG methods, however, are in general slow to converge and sensitive to stepsize choices. Therefore, much work has focused on speeding them up by reducing the variance of the estimated gradient or choosing better stepsizes. An alternative strategy would be to select a carefully chosen subset of training data, train only on that subset, and hence speed up optimization. However, it remains an open question how to achieve this, both theoretically as well as practically, while not compromising on the quality of the final model. Here we develop CRAIG, a method for selecting a weighted subset (or coreset) of training data in order to speed up IG methods. We prove that by greedily selecting a subset S of training data that minimizes the upper-bound on the estimation error of the full gradient, running IG on this subset will converge to the (near)optimal solution in the same number of epochs as running IG on the full data. But because at each epoch the gradients are computed only on the subset S, we obtain a speedup that is inversely proportional to the size of S. Our subset selection algorithm is fully general and can be applied to most IG methods. We further demonstrate practical effectiveness of our algorithm, CRAIG, through an extensive set of experiments on several applications, including logistic regression and deep neural networks. Experiments show that CRAIG, while achieving practically the same loss, speeds up IG methods by up to 10x for convex and 3x for non-convex (deep learning) problems.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SygRikHtvS
PDF	https://openreview.net/pdf?id=SygRikHtvS
PWC	https://paperswithcode.com/paper/coresets-for-accelerating-incremental
Repo
Framework

CONTRIBUTION OF INTERNAL REFLECTION IN LANGUAGE EMERGENCE WITH AN UNDER-RESTRICTED SITUATION


Title	CONTRIBUTION OF INTERNAL REFLECTION IN LANGUAGE EMERGENCE WITH AN UNDER-RESTRICTED SITUATION
Authors	Anonymous
Abstract	Owing to language emergence, human beings have been able to understand the intentions of others, generate common concepts, and extend new concepts. Artificial intelligence researchers have not only predicted words and sentences statistically in machine learning, but also created a language system by communicating with the machine itself. However, strong constraints are exhibited in current studies (supervisor signals and rewards exist, or the concepts were fixed on only a point), thus hindering the emergence of real-world languages. In this study, we improved Batali (1998) and Choi et al. (2018)’s research and attempted language emergence under conditions of low constraints such as human language generation. We included the bias that exists in humans as an “internal reflection function” into the system. Irrespective of function, messages corresponding to the label could be generated. However, through qualitative and quantitative analysis, we confirmed that the internal reflection function caused “overlearning” and different structuring of message patterns. This result suggested that the internal reflection function performed effectively in creating a grounding language from raw images with an under-restricted situation such as human language generation.
Tasks	Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=BylQm1HKvB
PDF	https://openreview.net/pdf?id=BylQm1HKvB
PWC	https://paperswithcode.com/paper/contribution-of-internal-reflection-in
Repo
Framework

Gradient Perturbation is Underrated for Differentially Private Convex Optimization


Title	Gradient Perturbation is Underrated for Differentially Private Convex Optimization
Authors	Anonymous
Abstract	Gradient perturbation, widely used for differentially private optimization, injects noise at every iterative update to guarantee differential privacy. Previous work first determines the noise level that can satisfy the privacy requirement and then analyzes the utility of noisy gradient updates as in non-private case. In this paper, we explore how the privacy noise affects the optimization property. We show that for differentially private convex optimization, the utility guarantee of both DP-GD and DP-SGD is determined by an \emph{expected curvature} rather than the minimum curvature. The \emph{expected curvature} represents the average curvature over the optimization path, which is usually much larger than the minimum curvature and hence can help us achieve a significantly improved utility guarantee. By using the \emph{expected curvature}, our theory justifies the advantage of gradient perturbation over other perturbation methods and closes the gap between theory and practice. Extensive experiments on real world datasets corroborate our theoretical findings.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJgVwTVtvS
PDF	https://openreview.net/pdf?id=rJgVwTVtvS
PWC	https://paperswithcode.com/paper/gradient-perturbation-is-underrated-for
Repo
Framework

Gumbel-Matrix Routing for Flexible Multi-task Learning


Title	Gumbel-Matrix Routing for Flexible Multi-task Learning
Authors	Anonymous
Abstract	This paper proposes a novel per-task routing method for multi-task applications. Multi-task neural networks can learn to transfer knowledge across different tasks by using parameter sharing. However, sharing parameters between unrelated tasks can hurt performance. To address this issue, routing networks can be applied to learn to share each group of parameters with a different subset of tasks to better leverage tasks relatedness. However, this use of routing methods requires to address the challenge of learning the routing jointly with the parameters of a modular multi-task neural network. We propose the Gumbel-Matrix routing, a novel multi-task routing method based on the Gumbel-Softmax, that is designed to learn fine-grained parameter sharing. When applied to the Omniglot benchmark, the proposed method improves the state-of-the-art error rate by 17%.
Tasks	Multi-Task Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=S1lHfxBFDH
PDF	https://openreview.net/pdf?id=S1lHfxBFDH
PWC	https://paperswithcode.com/paper/gumbel-matrix-routing-for-flexible-multi-task
Repo
Framework

Statistical Verification of General Perturbations by Gaussian Smoothing


Title	Statistical Verification of General Perturbations by Gaussian Smoothing
Authors	Anonymous
Abstract	We present a novel statistical certification method that generalizes prior work based on smoothing to handle richer perturbations. Concretely, our method produces a provable classifier which can establish statistical robustness against geometric perturbations (e.g., rotations, translations) as well as volume changes and pitch shifts on audio data. The generalization is non-trivial and requires careful handling of operations such as interpolation. Our method is agnostic to the choice of classifier and scales to modern architectures such as ResNet-50 on ImageNet.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=B1eZweHFwr
PDF	https://openreview.net/pdf?id=B1eZweHFwr
PWC	https://paperswithcode.com/paper/statistical-verification-of-general
Repo
Framework

Occlusion resistant learning of intuitive physics from videos


Title	Occlusion resistant learning of intuitive physics from videos
Authors	Anonymous
Abstract	To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics , has recently received attention and several methods were proposed to learn these physical rules from video sequences. Yet, most these methods are restricted to the case where no occlusions occur, narrowing the potential areas of application. The main contribution of this paper is a method combining a predictor of object dynamics and a neural renderer efficiently predicting future trajectories and explicitly modelling partial and full occlusions among objects. We present a training procedure enabling learning intuitive physics directly from the input videos containing segmentation masks of objects and their depth. Our results show that our model learns object dynamics despite significant inter-object occlusions, and realistically predicts segmentation masks up to 30 frames in the future. We study model performance for increasing levels of occlusions, and compare results to previous work on the tasks of future prediction and object following. We also show results on predicting motion of objects in real videos and demonstrate significant improvements over state-of-the-art on the object permanence task in the intuitive physics benchmark of Riochet et al. (2018).
Tasks	Future prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=HylfPgHYvr
PDF	https://openreview.net/pdf?id=HylfPgHYvr
PWC	https://paperswithcode.com/paper/occlusion-resistant-learning-of-intuitive
Repo
Framework

CROSS-DOMAIN CASCADED DEEP TRANSLATION


Title	CROSS-DOMAIN CASCADED DEEP TRANSLATION
Authors	Anonymous
Abstract	In recent years we have witnessed tremendous progress in unpaired image-to-image translation methods, propelled by the emergence of DNNs and adversarial training strategies. However, most existing methods focus on transfer of style and appearance, rather than on shape translation. The latter task is challenging, due to its intricate non-local nature, which calls for additional supervision. We mitigate this by descending the deep layers of a pre-trained network, where the deep features contain more semantics, and applying the translation between these deep features. Specifically, we leverage VGG, which is a classification network, pre-trained with large-scale semantic supervision. Our translation is performed in a cascaded, deep-to-shallow, fashion, along the deep feature hierarchy: we first translate between the deepest layers that encode the higher-level semantic content of the image, proceeding to translate the shallower layers, conditioned on the deeper ones. We show that our method is able to translate between different domains, which exhibit significantly different shapes. We evaluate our method both qualitatively and quantitatively and compare it to state-of-the-art image-to-image translation methods. Our code and trained models will be made available.
Tasks	Image-to-Image Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=BJe4JJBYwS
PDF	https://openreview.net/pdf?id=BJe4JJBYwS
PWC	https://paperswithcode.com/paper/cross-domain-cascaded-deep-translation
Repo
Framework

Are Few-shot Learning Benchmarks Too Simple ?


Title	Are Few-shot Learning Benchmarks Too Simple ?
Authors	Anonymous
Abstract	We argue that the widely used Omniglot and miniImageNet benchmarks are too simple because their class semantics do not vary across episodes, which defeats their intended purpose of evaluating few-shot classification methods. The class semantics of Omniglot is invariably “characters” and the class semantics of miniImageNet, “object category”. Because the class semantics are so similar, we propose a new method called Centroid Networks which can achieve surprisingly high accuracies on Omniglot and miniImageNet without using any labels at metaevaluation time. Our results suggest that those benchmarks are not adapted for supervised few-shot classification since the supervision itself is not necessary during meta-evaluation. The Meta-Dataset, a collection of 10 datasets, was recently proposed as a harder few-shot classification benchmark. Using our method, we derive a new metric, the Class Semantics Consistency Criterion, and use it to quantify the difficulty of Meta-Dataset. Finally, under some restrictive assumptions, we show that Centroid Networks is faster and more accurate than a state-of-the-art learning-to-cluster method (Hsu et al., 2018).
Tasks	Few-Shot Learning, Omniglot
Published	2020-01-01
URL	https://openreview.net/forum?id=SygeY1SYvr
PDF	https://openreview.net/pdf?id=SygeY1SYvr
PWC	https://paperswithcode.com/paper/are-few-shot-learning-benchmarks-too-simple
Repo
Framework

Fix-Net: pure fixed-point representation of deep neural networks


Title	Fix-Net: pure fixed-point representation of deep neural networks
Authors	Anonymous
Abstract	Deep neural networks (DNNs) dominate current research in machine learning. Due to massive GPU parallelization DNN training is no longer a bottleneck, and large models with many parameters and high computational effort lead common benchmark tables. In contrast, embedded devices have a very limited capability. As a result, both model size and inference time must be significantly reduced if DNNs are to achieve suitable performance on embedded devices. We propose a soft quantization approach to train DNNs that can be evaluated using pure fixed-point arithmetic. By exploiting the bit-shift mechanism, we derive fixed-point quantization constraints for all important components, including batch normalization and ReLU. Compared to floating-point arithmetic, fixed-point calculations significantly reduce computational effort whereas low-bit representations immediately decrease memory costs. We evaluate our approach with different architectures on common benchmark data sets and compare with recent quantization approaches. We achieve new state of the art performance using 4-bit fixed-point models with an error rate of 4.98% on CIFAR-10.
Tasks	Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=rJgKzlSKPH
PDF	https://openreview.net/pdf?id=rJgKzlSKPH
PWC	https://paperswithcode.com/paper/fix-net-pure-fixed-point-representation-of
Repo
Framework

Monte Carlo Deep Neural Network Arithmetic


Title	Monte Carlo Deep Neural Network Arithmetic
Authors	Anonymous
Abstract	Quantization is a crucial technique for achieving low-power, low latency and high throughput hardware implementations of Deep Neural Networks. Quantized floating point representations have received recent interest due to their hardware efficiency benefits and ability to represent a higher dynamic range than fixed point representations, leading to improvements in accuracy. We present a novel technique, Monte Carlo Deep Neural Network Arithmetic (MCA), for determining the sensitivity of Deep Neural Networks to quantization in floating point arithmetic.We do this by applying Monte Carlo Arithmetic to the inference computation and analyzing the relative standard deviation of the neural network loss. The method makes no assumptions regarding the underlying parameter distributions. We evaluate our method on pre-trained image classification models on the CIFAR10 andImageNet datasets. For the same network topology and dataset, we demonstrate the ability to gain the equivalent of bits of precision by simply choosing weight parameter sets which demonstrate a lower loss of significance from the Monte Carlo trials. Additionally, we can apply MCA to compare the sensitivity of different network topologies to quantization effects.
Tasks	Image Classification, Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=HyePberFvH
PDF	https://openreview.net/pdf?id=HyePberFvH
PWC	https://paperswithcode.com/paper/monte-carlo-deep-neural-network-arithmetic
Repo
Framework

Hybrid Weight Representation: A Quantization Method Represented with Ternary and Sparse-Large Weights


Title	Hybrid Weight Representation: A Quantization Method Represented with Ternary and Sparse-Large Weights
Authors	Anonymous
Abstract	Previous ternarizations such as the trained ternary quantization (TTQ), which quantized weights to three values (e.g., {−Wn, 0,+Wp}), achieved the small model size and efficient inference process. However, the extreme limit on the number of quantization steps causes some degradation in accuracy. To solve this problem, we propose a hybrid weight representation (HWR) method which produces a network consisting of two types of weights, i.e., ternary weights (TW) and sparse-large weights (SLW). The TW is similar to the TTQ’s and requires three states to be stored in memory with 2 bits. We utilize the one remaining state to indicate the SLW which is referred to as very rare and greater than TW. In HWR, we represent TW with values while SLW with indices of values. By encoding SLW, the networks can preserve their model size with improving their accuracy. To fully utilize HWR, we also introduce a centralized quantization (CQ) process with a weighted ridge (WR) regularizer. They aim to reduce the entropy of weight distributions by centralizing weights toward ternary values. Our comprehensive experiments show that HWR outperforms the state-of-the-art compressed models in terms of the trade-off between model size and accuracy. Our proposed representation increased the AlexNet performance on CIFAR-100 by 4.15% with only1.13% increase in model size.
Tasks	Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=H1gZsJBYwH
PDF	https://openreview.net/pdf?id=H1gZsJBYwH
PWC	https://paperswithcode.com/paper/hybrid-weight-representation-a-quantization
Repo
Framework

Goten: GPU-Outsourcing Trusted Execution of Neural Network Training and Prediction


Title	Goten: GPU-Outsourcing Trusted Execution of Neural Network Training and Prediction
Authors	Anonymous
Abstract	Before we saw worldwide collaborative efforts in training machine-learning models or widespread deployments of prediction-as-a-service, we need to devise an efﬁcient privacy-preserving mechanism which guarantees the privacy of all stakeholders (data contributors, model owner, and queriers). Slaom (ICLR ’19) preserves privacy only for prediction by leveraging both trusted environment (e.g., Intel SGX) and untrusted GPU. The challenges for enabling private training are explicitly left open – its pre-computation technique does not hide the model weights and fails to support dynamic quantization corresponding to the large changes in weight magnitudes during training. Moreover, it is not a truly outsourcing solution since (ofﬂine) pre-computation for a job takes as much time as computing the job locally by SGX, i.e., it only works before all pre-computations are exhausted. We propose Goten, a privacy-preserving framework supporting both training and prediction. We tackle all the above challenges by proposing a secure outsourcing protocol which 1) supports dynamic quantization, 2) hides the model weight from GPU, and 3) performs better than a pure-SGX solution even if we perform the precomputation online. Our solution leverages a non-colluding assumption which is often employed by cryptographic solutions aiming for practical efﬁciency (IEEE SP ’13, Usenix Security ’17, PoPETs ’19). We use three servers, which can be reduced to two if the pre-computation is done ofﬂine. Furthermore, we implement our tailor-made memory-aware measures for minimizing the overhead when the SGX memory limit is exceeded (cf., EuroSys ’17, Usenix ATC ’19). Compared to a pure-SGX solution, our experiments show that Goten can speed up linear-layer computations in VGG up to 40×, and overall speed up by 8.64× on VGG11.
Tasks	Quantization
Published	2020-01-01
URL	https://openreview.net/forum?id=S1xRnxSYwS
PDF	https://openreview.net/pdf?id=S1xRnxSYwS
PWC	https://paperswithcode.com/paper/goten-gpu-outsourcing-trusted-execution-of
Repo
Framework

Semi-supervised Pose Estimation with Geometric Latent Representations


Title	Semi-supervised Pose Estimation with Geometric Latent Representations
Authors	Anonymous
Abstract	Pose estimation is the task of finding the orientation of an object within an image with respect to a fixed frame of reference. Current classification and regression approaches to the task require large quantities of labelled data for their purposes. The amount of labelled data for pose estimation is relatively limited. With this in mind, we propose the use of Conditional Variational Autoencoders (CVAEs) \cite{Kingma2014a} with circular latent representations to estimate the corresponding 2D rotations of an object. The method is capable of training with datasets that have an arbitrary amount of labelled images providing relatively similar performance for cases in which 10-20% of the labels for images is missing.
Tasks	Pose Estimation
Published	2020-01-01
URL	https://openreview.net/forum?id=S1et8gBKwH
PDF	https://openreview.net/pdf?id=S1et8gBKwH
PWC	https://paperswithcode.com/paper/semi-supervised-pose-estimation-with
Repo
Framework