April 2, 2020

3245 words 16 mins read

Paper Group ANR 261

Paper Group ANR 261

Weakly-Supervised Disentanglement Without Compromises. CAHPHF: Context-Aware Hierarchical QoS Prediction with Hybrid Filtering. Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning. Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training. Learning Nonlinear Loop Invariants with Gated Continuou …

Weakly-Supervised Disentanglement Without Compromises

Title Weakly-Supervised Disentanglement Without Compromises
Authors Francesco Locatello, Ben Poole, Gunnar Rätsch, Bernhard Schölkopf, Olivier Bachem, Michael Tschannen
Abstract Intelligent agents should be able to learn useful representations by observing changes in their environment. We model such observations as pairs of non-i.i.d. images sharing at least one of the underlying factors of variation. First, we theoretically show that only knowing how many factors have changed, but not which ones, is sufficient to learn disentangled representations. Second, we provide practical algorithms that learn disentangled representations from pairs of images without requiring annotation of groups, individual factors, or the number of factors that have changed. Third, we perform a large-scale empirical study and show that such pairs of observations are sufficient to reliably learn disentangled representations on several benchmark data sets. Finally, we evaluate our learned representations and find that they are simultaneously useful on a diverse suite of tasks, including generalization under covariate shifts, fairness, and abstract reasoning. Overall, our results demonstrate that weak supervision enables learning of useful disentangled representations in realistic scenarios.
Published 2020-02-07
URL https://arxiv.org/abs/2002.02886v1
PDF https://arxiv.org/pdf/2002.02886v1.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-disentanglement-without

CAHPHF: Context-Aware Hierarchical QoS Prediction with Hybrid Filtering

Title CAHPHF: Context-Aware Hierarchical QoS Prediction with Hybrid Filtering
Authors Ranjana Roy Chowdhury, Soumi Chattopadhyay, Chandranath Adak
Abstract With the proliferation of Internet-of-Things and continuous growth in the number of web services at the Internet-scale, the service recommendation is becoming a challenge nowadays. One of the prime aspects influencing the service recommendation is the Quality-of-Service (QoS) parameter, which depicts the performance of a web service. In general, the service provider furnishes the value of the QoS parameters during service deployment. However, in reality, the QoS values of service vary across different users, time, locations, etc. Therefore, estimating the QoS value of service before its execution is an important task, and thus the QoS prediction has gained significant research attention. Multiple approaches are available in the literature for predicting service QoS. However, these approaches are yet to reach the desired accuracy level. In this paper, we study the QoS prediction problem across different users, and propose a novel solution by taking into account the contextual information of both services and users. Our proposal includes two key steps: (a) hybrid filtering and (b) hierarchical prediction mechanism. On the one hand, the hybrid filtering method aims to obtain a set of similar users and services, given a target user and a service. On the other hand, the goal of the hierarchical prediction mechanism is to estimate the QoS value accurately by leveraging hierarchical neural-regression. We evaluate our framework on the publicly available WS-DREAM datasets. The experimental results show the outperformance of our framework over the major state-of-the-art approaches.
Published 2020-01-13
URL https://arxiv.org/abs/2001.09897v1
PDF https://arxiv.org/pdf/2001.09897v1.pdf
PWC https://paperswithcode.com/paper/cahphf-context-aware-hierarchical-qos

Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning

Title Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning
Authors Rui Liu, Sanjan Krishnan, Aaron J. Elmore, Michael J. Franklin
Abstract As neural networks are increasingly employed in machine learning practice, organizations will have to determine how to share limited training resources among a diverse set of model training tasks. This paper studies jointly training multiple neural network models on a single GPU. We presents an empirical study of this operation, called pack, and end-to-end experiments that suggest significant improvements for hyperparameter search systems. Our research prototype is in TensorFlow, and we evaluate performance across different models (ResNet, MobileNet, DenseNet, and MLP) and training scenarios. The results suggest: (1) packing two models can bring up to 40% performance improvement over unpacked setups for a single training step and the improvement increases when packing more models; (2) the benefit of a pack primitive largely depends on a number of factors including memory capacity, chip architecture, neural network structure, and batch size; (3) there exists a trade-off between packing and unpacking when training multiple neural network models on limited resources; (4) a pack-based Hyperband is up to 2.7x faster than the original Hyperband training method in our experiment setting, with this improvement growing as memory size increases and subsequently the density of models packed.
Published 2020-02-07
URL https://arxiv.org/abs/2002.02885v1
PDF https://arxiv.org/pdf/2002.02885v1.pdf
PWC https://paperswithcode.com/paper/understanding-and-optimizing-packed-neural

Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training

Title Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training
Authors Thomas O’Leary-Roseberry, Omar Ghattas
Abstract In this work we analyze the role nonlinear activation functions play at stationary points of dense neural network training problems. We consider a generic least squares loss function training formulation. We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape. We show that for shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima (if they exist), and therefore determines the ill-posedness of the training problem. Furthermore, for shallow nonlinear networks we show that the zeros of the activation function and its derivatives can lead to spurious local minima, and discuss conditions for strict saddle points. We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points, due to how it shows up in the gradient from the chain rule.
Published 2020-02-07
URL https://arxiv.org/abs/2002.02882v1
PDF https://arxiv.org/pdf/2002.02882v1.pdf
PWC https://paperswithcode.com/paper/ill-posedness-and-optimization-geometry-for

Learning Nonlinear Loop Invariants with Gated Continuous Logic Networks

Title Learning Nonlinear Loop Invariants with Gated Continuous Logic Networks
Authors Jianan Yao, Gabriel Ryan, Justin Wong, Suman Jana, Ronghui Gu
Abstract In many cases, verifying real-world programs requires inferring loop invariants with nonlinear constraints. This is especially true in programs that perform many numerical operations, such as control systems for avionics or industrial plants. Recently, data-driven methods for loop invariant inference have gained popularity, especially on linear loop invariants. However, applying data-driven inference to nonlinear invariants is challenging due to the large numbers of and large magnitudes of high-order terms, the potential for overfitting on samples, and the large space of possible nonlinear inequality bounds. In this paper, we introduce a new neural architecture for general SMT learning, the Gated Continuous Logic Network (G-CLN), and apply it to nonlinear loop invariant learning. G-CLNs extend the Continuous Logic Network architecture with gating units and dropout, which allow the model to robustly learn general invariants over large numbers of terms. To address overfitting that arises from finite program sampling, we introduce fractional sampling—a sound relaxation of loop semantics to continuous functions that facilitates unbounded sampling on the real domain. We also design a new CLN activation function, the Piecewise Biased Quadratic Unit (PBQU), for naturally learning tight inequality bounds. We incorporate these methods into a nonlinear loop invariant inference system that can learn general nonlinear loop invariants. We evaluate our system on a benchmark of nonlinear loop invariants and show it solves 26 out of 27 problems, 3 more than prior work, with an average runtime of 53.3 seconds. We further demonstrate the generic learning ability of G-CLNs by solving all 124 problems in the linear Code2Inv benchmark. We also perform a quantitative stability evaluation and show G-CLNs have a convergence rate of $97.5%$ on quadratic problems, a $39.2%$ improvement over CLN models.
Published 2020-03-17
URL https://arxiv.org/abs/2003.07959v2
PDF https://arxiv.org/pdf/2003.07959v2.pdf
PWC https://paperswithcode.com/paper/learning-nonlinear-loop-invariants-with-gated

SDOD:Real-time Segmenting and Detecting 3D Object by Depth

Title SDOD:Real-time Segmenting and Detecting 3D Object by Depth
Authors Caiyi Xu, Jianping Xing, Yafei Ning, Yonghong Chen, Yong Wu
Abstract Most existing instance segmentation methods only focus on 2D objects and are not suitable for 3D scenes such as autonomous driving. In this paper, we propose a model that splits instance segmentation and object detection into two parallel branches. We discretize the objects depth into depth categories (background set to 0, objects set to [1, K]), then the instance segmentation task has been transformed into a pixel-level classification task. Mask branch predicts pixel-level depth categories, 3D branch predicts instance-level depth categories, we produce instance mask by assigning pixels which have same depth categories to each instance. In addition, in order to solve the problem of imbalanced between mask labels and 3D labels in the KITTI dataset (200 for mask, 7481 for 3D), we introduce coarse mask generated by auto-annotation model to increase samples.
Tasks Autonomous Driving, Instance Segmentation, Object Detection, Semantic Segmentation
Published 2020-01-26
URL https://arxiv.org/abs/2001.09425v2
PDF https://arxiv.org/pdf/2001.09425v2.pdf
PWC https://paperswithcode.com/paper/sdodreal-time-segmenting-and-detecting-3d

Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP

Title Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP
Authors Ying Xu, Xu Zhong, Antonio Jose Jimeno Yepes, Jey Han Lau
Abstract An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify. While there are a number of methods proposed to generate adversarial examples for text data, it is not trivial to assess the quality of these adversarial examples, as minor perturbations (such as changing a word in a sentence) can lead to a significant shift in their meaning, readability and classification label. In this paper, we propose an evaluation framework to assess the quality of adversarial examples based on the aforementioned properties. We experiment with five benchmark attacking methods and an alternative approach based on an auto-encoder, and found that these methods generate adversarial examples with poor readability and content preservation. We also learned that there are multiple factors that can influence the attacking performance, such as the the length of text examples and the input domain.
Published 2020-01-22
URL https://arxiv.org/abs/2001.07820v1
PDF https://arxiv.org/pdf/2001.07820v1.pdf
PWC https://paperswithcode.com/paper/elephant-in-the-room-an-evaluation-framework

Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet

Title Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet
Authors Sizhe Chen, Zhengbao He, Chengjin Sun, Xiaolin Huang
Abstract Adversarial attacks on deep neural networks (DNNs) have been found for several years. However, the existing adversarial attacks have high success rates only when the information of the attacked DNN is well-known or could be estimated by structure similarity or massive queries. In this paper, we propose an \emph{Attack on Attention} (AoA), a semantic feature commonly shared by DNNs. The transferability of AoA is quite high. With no more than 10 queries of the decision only, AoA can achieve almost 100% success rate when attacking on many popular DNNs. Even without query, AoA could keep a surprisingly high attack performance. We apply AoA to generate 96020 adversarial samples from ImageNet to defeat many neural networks, and thus name the dataset as \emph{DAmageNet}. 20 well-trained DNNs are tested on DAmageNet. Without adversarial training, most of the tested DNNs have an error rate over 90%. DAmageNet is the first universal adversarial dataset and it could serve as a benchmark for robustness testing and adversarial training.
Tasks Adversarial Attack
Published 2020-01-16
URL https://arxiv.org/abs/2001.06325v1
PDF https://arxiv.org/pdf/2001.06325v1.pdf
PWC https://paperswithcode.com/paper/universal-adversarial-attack-on-attention-and

Differentiable Fixed-Point Iteration Layer

Title Differentiable Fixed-Point Iteration Layer
Authors Younahan Jeon, Minsik Lee, Jin Young Choi
Abstract Recently, several studies proposed methods to utilize some restricted classes of optimization problems as layers of deep neural networks. However, these methods are still in their infancy and require special treatments, i.e., analyzing the KKT condition, etc., for deriving the backpropagation formula. Instead, in this paper, we propose a method to utilize fixed-point iteration (FPI), a generalization of many types of numerical algorithms, as a network layer. We show that the derivative of an FPI layer depends only on the fixed point, and then we present a method to calculate it efficiently using another FPI which we call the backward FPI. The proposed method can be easily implemented based on the autograd functionalities in existing deep learning tools. Since FPI covers vast different types of numerical algorithms in machine learning and other fields, it has a lot of potential applications. In the experiments, the differentiable FPI layer is applied to two scenarios, i.e., gradient descent iterations for differentiable optimization problems and FPI with arbitrary neural network modules, of which the results demonstrate the simplicity and the effectiveness.
Published 2020-02-07
URL https://arxiv.org/abs/2002.02868v1
PDF https://arxiv.org/pdf/2002.02868v1.pdf
PWC https://paperswithcode.com/paper/differentiable-fixed-point-iteration-layer

On the Estimation of Information Measures of Continuous Distributions

Title On the Estimation of Information Measures of Continuous Distributions
Authors Georg Pichler, Pablo Piantanida, Günther Koliander
Abstract The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.
Published 2020-02-07
URL https://arxiv.org/abs/2002.02851v1
PDF https://arxiv.org/pdf/2002.02851v1.pdf
PWC https://paperswithcode.com/paper/on-the-estimation-of-information-measures-of

Distributional Reinforcement Learning with Ensembles

Title Distributional Reinforcement Learning with Ensembles
Authors Björn Lindenberg, Jonas Nordqvist, Karl-Olof Lindahl
Abstract It is well-known that ensemble methods often provide enhanced performance in reinforcement learning. In this paper we explore this concept further by using group-aided training within the distributional reinforcement learning paradigm. Specifically, we propose an extension to categorical reinforcement learning, where distributional learning targets are implicitly based on the total information gathered by an ensemble. We empirically show that this may lead to much more robust initial learning, a stronger individual performance level and good efficiency on a per-sample basis.
Tasks Distributional Reinforcement Learning
Published 2020-03-24
URL https://arxiv.org/abs/2003.10903v1
PDF https://arxiv.org/pdf/2003.10903v1.pdf
PWC https://paperswithcode.com/paper/distributional-reinforcement-learning-with-2

Millimeter Wave Communications with an Intelligent Reflector: Performance Optimization and Distributional Reinforcement Learning

Title Millimeter Wave Communications with an Intelligent Reflector: Performance Optimization and Distributional Reinforcement Learning
Authors Qianqian Zhang, Walid Saad, Mehdi Bennis
Abstract In this paper, a novel framework is proposed to optimize the downlink multi-user communication of a millimeter wave base station, which is assisted by a reconfigurable intelligent reflector (IR). In particular, a channel estimation approach is developed to measure the channel state information (CSI) in real-time. First, for a perfect CSI scenario, the optimal precoding transmission and power allocation is derived so as to maximize the sum of downlink rates towards multiple users, followed by the optimization of IR reflection coefficient to enhance the upper bound of the downlink transmission. Next, in the imperfect CSI scenario, a distributional reinforcement learning (DRL) approach is proposed to learn the optimal IR reflection and maximize the expectation of downlink capacity. In order to model the transmission rate’s probability distribution, a learning algorithm, based on quantile regression (QR), is developed, and the proposed QR-DRL method is proved to converge to a stable distribution of downlink transmission rate. Simulation results show that, in the error-free CSI scenario, the proposed transmission approach yields over 20% and 2-fold increase in the downlink sum-rate, compared with a fixed IR reflection scheme and direct transmission scheme, respectively. Simulation results also show that by increasing the number of IR components, the downlink rate can be improved faster than by increasing the number of antennas at the BS. Furthermore, under limited knowledge of CSI, simulation results show that the proposed QR-DRL method, which learns a full distribution of the downlink rate, yields a better prediction accuracy and improves the downlink rate by 10% for online deployments, compared with a Q-learning baseline.
Tasks Distributional Reinforcement Learning, Q-Learning
Published 2020-02-24
URL https://arxiv.org/abs/2002.10572v1
PDF https://arxiv.org/pdf/2002.10572v1.pdf
PWC https://paperswithcode.com/paper/millimeter-wave-communications-with-an

Assessing the Adversarial Robustness of Monte Carlo and Distillation Methods for Deep Bayesian Neural Network Classification

Title Assessing the Adversarial Robustness of Monte Carlo and Distillation Methods for Deep Bayesian Neural Network Classification
Authors Meet P. Vadera, Satya Narayan Shukla, Brian Jalaian, Benjamin M. Marlin
Abstract In this paper, we consider the problem of assessing the adversarial robustness of deep neural network models under both Markov chain Monte Carlo (MCMC) and Bayesian Dark Knowledge (BDK) inference approximations. We characterize the robustness of each method to two types of adversarial attacks: the fast gradient sign method (FGSM) and projected gradient descent (PGD). We show that full MCMC-based inference has excellent robustness, significantly outperforming standard point estimation-based learning. On the other hand, BDK provides marginal improvements. As an additional contribution, we present a storage-efficient approach to computing adversarial examples for large Monte Carlo ensembles using both the FGSM and PGD attacks.
Published 2020-02-07
URL https://arxiv.org/abs/2002.02842v1
PDF https://arxiv.org/pdf/2002.02842v1.pdf
PWC https://paperswithcode.com/paper/assessing-the-adversarial-robustness-of-monte

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Title Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling
Authors Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, In So Kweon
Abstract Visual storytelling is a task of creating a short story based on photo streams. Unlike existing visual captioning, storytelling aims to contain not only factual descriptions, but also human-like narration and semantics. However, the VIST dataset consists only of a small, fixed number of photos per story. Therefore, the main challenge of visual storytelling is to fill in the visual gap between photos with narrative and imaginative story. In this paper, we propose to explicitly learn to imagine a storyline that bridges the visual gap. During training, one or more photos is randomly omitted from the input stack, and we train the network to produce a full plausible story even with missing photo(s). Furthermore, we propose for visual storytelling a hide-and-tell model, which is designed to learn non-local relations across the photo streams and to refine and improve conventional RNN-based models. In experiments, we show that our scheme of hide-and-tell, and the network design are indeed effective at storytelling, and that our model outperforms previous state-of-the-art methods in automatic metrics. Finally, we qualitatively show the learned ability to interpolate storyline over visual gaps.
Tasks Image Captioning, Visual Storytelling
Published 2020-02-03
URL https://arxiv.org/abs/2002.00774v1
PDF https://arxiv.org/pdf/2002.00774v1.pdf
PWC https://paperswithcode.com/paper/hide-and-tell-learning-to-bridge-photo

UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning

Title UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning
Authors Quan Hoang Lam, Quang Duy Le, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
Abstract Image Captioning, the task of automatic generation of image captions, has attracted attentions from researchers in many fields of computer science, being computer vision, natural language processing and machine learning in recent years. This paper contributes to research on Image Captioning task in terms of extending dataset to a different language - Vietnamese. So far, there is no existed Image Captioning dataset for Vietnamese language, so this is the foremost fundamental step for developing Vietnamese Image Captioning. In this scope, we first build a dataset which contains manually written captions for images from Microsoft COCO dataset relating to sports played with balls, we called this dataset UIT-ViIC. UIT-ViIC consists of 19,250 Vietnamese captions for 3,850 images. Following that, we evaluate our dataset on deep neural network models and do comparisons with English dataset and two Vietnamese datasets built by different methods. UIT-ViIC is published on our lab website for research purposes.
Tasks Image Captioning
Published 2020-02-01
URL https://arxiv.org/abs/2002.00175v1
PDF https://arxiv.org/pdf/2002.00175v1.pdf
PWC https://paperswithcode.com/paper/uit-viic-a-dataset-for-the-first-evaluation
comments powered by Disqus