April 2, 2020

3245 words 16 mins read

Paper Group ANR 261

Weakly-Supervised Disentanglement Without Compromises. CAHPHF: Context-Aware Hierarchical QoS Prediction with Hybrid Filtering. Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning. Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training. Learning Nonlinear Loop Invariants with Gated Continuou …

Weakly-Supervised Disentanglement Without Compromises


Title	Weakly-Supervised Disentanglement Without Compromises
Authors	Francesco Locatello, Ben Poole, Gunnar Rätsch, Bernhard Schölkopf, Olivier Bachem, Michael Tschannen
Abstract	Intelligent agents should be able to learn useful representations by observing changes in their environment. We model such observations as pairs of non-i.i.d. images sharing at least one of the underlying factors of variation. First, we theoretically show that only knowing how many factors have changed, but not which ones, is sufficient to learn disentangled representations. Second, we provide practical algorithms that learn disentangled representations from pairs of images without requiring annotation of groups, individual factors, or the number of factors that have changed. Third, we perform a large-scale empirical study and show that such pairs of observations are sufficient to reliably learn disentangled representations on several benchmark data sets. Finally, we evaluate our learned representations and find that they are simultaneously useful on a diverse suite of tasks, including generalization under covariate shifts, fairness, and abstract reasoning. Overall, our results demonstrate that weak supervision enables learning of useful disentangled representations in realistic scenarios.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02886v1
PDF	https://arxiv.org/pdf/2002.02886v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-disentanglement-without
Repo
Framework

CAHPHF: Context-Aware Hierarchical QoS Prediction with Hybrid Filtering


Title	CAHPHF: Context-Aware Hierarchical QoS Prediction with Hybrid Filtering
Authors	Ranjana Roy Chowdhury, Soumi Chattopadhyay, Chandranath Adak
Abstract	With the proliferation of Internet-of-Things and continuous growth in the number of web services at the Internet-scale, the service recommendation is becoming a challenge nowadays. One of the prime aspects influencing the service recommendation is the Quality-of-Service (QoS) parameter, which depicts the performance of a web service. In general, the service provider furnishes the value of the QoS parameters during service deployment. However, in reality, the QoS values of service vary across different users, time, locations, etc. Therefore, estimating the QoS value of service before its execution is an important task, and thus the QoS prediction has gained significant research attention. Multiple approaches are available in the literature for predicting service QoS. However, these approaches are yet to reach the desired accuracy level. In this paper, we study the QoS prediction problem across different users, and propose a novel solution by taking into account the contextual information of both services and users. Our proposal includes two key steps: (a) hybrid filtering and (b) hierarchical prediction mechanism. On the one hand, the hybrid filtering method aims to obtain a set of similar users and services, given a target user and a service. On the other hand, the goal of the hierarchical prediction mechanism is to estimate the QoS value accurately by leveraging hierarchical neural-regression. We evaluate our framework on the publicly available WS-DREAM datasets. The experimental results show the outperformance of our framework over the major state-of-the-art approaches.
Tasks
Published	2020-01-13
URL	https://arxiv.org/abs/2001.09897v1
PDF	https://arxiv.org/pdf/2001.09897v1.pdf
PWC	https://paperswithcode.com/paper/cahphf-context-aware-hierarchical-qos
Repo
Framework

Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning


Title	Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning
Authors	Rui Liu, Sanjan Krishnan, Aaron J. Elmore, Michael J. Franklin
Abstract	As neural networks are increasingly employed in machine learning practice, organizations will have to determine how to share limited training resources among a diverse set of model training tasks. This paper studies jointly training multiple neural network models on a single GPU. We presents an empirical study of this operation, called pack, and end-to-end experiments that suggest significant improvements for hyperparameter search systems. Our research prototype is in TensorFlow, and we evaluate performance across different models (ResNet, MobileNet, DenseNet, and MLP) and training scenarios. The results suggest: (1) packing two models can bring up to 40% performance improvement over unpacked setups for a single training step and the improvement increases when packing more models; (2) the benefit of a pack primitive largely depends on a number of factors including memory capacity, chip architecture, neural network structure, and batch size; (3) there exists a trade-off between packing and unpacking when training multiple neural network models on limited resources; (4) a pack-based Hyperband is up to 2.7x faster than the original Hyperband training method in our experiment setting, with this improvement growing as memory size increases and subsequently the density of models packed.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02885v1
PDF	https://arxiv.org/pdf/2002.02885v1.pdf
PWC	https://paperswithcode.com/paper/understanding-and-optimizing-packed-neural
Repo
Framework

Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training


Title	Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training
Authors	Thomas O’Leary-Roseberry, Omar Ghattas
Abstract	In this work we analyze the role nonlinear activation functions play at stationary points of dense neural network training problems. We consider a generic least squares loss function training formulation. We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape. We show that for shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima (if they exist), and therefore determines the ill-posedness of the training problem. Furthermore, for shallow nonlinear networks we show that the zeros of the activation function and its derivatives can lead to spurious local minima, and discuss conditions for strict saddle points. We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points, due to how it shows up in the gradient from the chain rule.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02882v1
PDF	https://arxiv.org/pdf/2002.02882v1.pdf
PWC	https://paperswithcode.com/paper/ill-posedness-and-optimization-geometry-for
Repo
Framework

Learning Nonlinear Loop Invariants with Gated Continuous Logic Networks


Title	Learning Nonlinear Loop Invariants with Gated Continuous Logic Networks
Authors	Jianan Yao, Gabriel Ryan, Justin Wong, Suman Jana, Ronghui Gu
Abstract	In many cases, verifying real-world programs requires inferring loop invariants with nonlinear constraints. This is especially true in programs that perform many numerical operations, such as control systems for avionics or industrial plants. Recently, data-driven methods for loop invariant inference have gained popularity, especially on linear loop invariants. However, applying data-driven inference to nonlinear invariants is challenging due to the large numbers of and large magnitudes of high-order terms, the potential for overfitting on samples, and the large space of possible nonlinear inequality bounds. In this paper, we introduce a new neural architecture for general SMT learning, the Gated Continuous Logic Network (G-CLN), and apply it to nonlinear loop invariant learning. G-CLNs extend the Continuous Logic Network architecture with gating units and dropout, which allow the model to robustly learn general invariants over large numbers of terms. To address overfitting that arises from finite program sampling, we introduce fractional sampling—a sound relaxation of loop semantics to continuous functions that facilitates unbounded sampling on the real domain. We also design a new CLN activation function, the Piecewise Biased Quadratic Unit (PBQU), for naturally learning tight inequality bounds. We incorporate these methods into a nonlinear loop invariant inference system that can learn general nonlinear loop invariants. We evaluate our system on a benchmark of nonlinear loop invariants and show it solves 26 out of 27 problems, 3 more than prior work, with an average runtime of 53.3 seconds. We further demonstrate the generic learning ability of G-CLNs by solving all 124 problems in the linear Code2Inv benchmark. We also perform a quantitative stability evaluation and show G-CLNs have a convergence rate of $97.5%$ on quadratic problems, a $39.2%$ improvement over CLN models.
Tasks
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07959v2
PDF	https://arxiv.org/pdf/2003.07959v2.pdf
PWC	https://paperswithcode.com/paper/learning-nonlinear-loop-invariants-with-gated
Repo
Framework

SDOD:Real-time Segmenting and Detecting 3D Object by Depth


Title	SDOD:Real-time Segmenting and Detecting 3D Object by Depth
Authors	Caiyi Xu, Jianping Xing, Yafei Ning, Yonghong Chen, Yong Wu
Abstract	Most existing instance segmentation methods only focus on 2D objects and are not suitable for 3D scenes such as autonomous driving. In this paper, we propose a model that splits instance segmentation and object detection into two parallel branches. We discretize the objects depth into depth categories (background set to 0, objects set to [1, K]), then the instance segmentation task has been transformed into a pixel-level classification task. Mask branch predicts pixel-level depth categories, 3D branch predicts instance-level depth categories, we produce instance mask by assigning pixels which have same depth categories to each instance. In addition, in order to solve the problem of imbalanced between mask labels and 3D labels in the KITTI dataset (200 for mask, 7481 for 3D), we introduce coarse mask generated by auto-annotation model to increase samples.
Tasks	Autonomous Driving, Instance Segmentation, Object Detection, Semantic Segmentation
Published	2020-01-26
URL	https://arxiv.org/abs/2001.09425v2
PDF	https://arxiv.org/pdf/2001.09425v2.pdf
PWC	https://paperswithcode.com/paper/sdodreal-time-segmenting-and-detecting-3d
Repo
Framework

Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP


Title	Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP
Authors	Ying Xu, Xu Zhong, Antonio Jose Jimeno Yepes, Jey Han Lau
Abstract	An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify. While there are a number of methods proposed to generate adversarial examples for text data, it is not trivial to assess the quality of these adversarial examples, as minor perturbations (such as changing a word in a sentence) can lead to a significant shift in their meaning, readability and classification label. In this paper, we propose an evaluation framework to assess the quality of adversarial examples based on the aforementioned properties. We experiment with five benchmark attacking methods and an alternative approach based on an auto-encoder, and found that these methods generate adversarial examples with poor readability and content preservation. We also learned that there are multiple factors that can influence the attacking performance, such as the the length of text examples and the input domain.
Tasks
Published	2020-01-22
URL	https://arxiv.org/abs/2001.07820v1
PDF	https://arxiv.org/pdf/2001.07820v1.pdf
PWC	https://paperswithcode.com/paper/elephant-in-the-room-an-evaluation-framework
Repo
Framework

Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet


Title	Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet
Authors	Sizhe Chen, Zhengbao He, Chengjin Sun, Xiaolin Huang
Abstract	Adversarial attacks on deep neural networks (DNNs) have been found for several years. However, the existing adversarial attacks have high success rates only when the information of the attacked DNN is well-known or could be estimated by structure similarity or massive queries. In this paper, we propose an \emph{Attack on Attention} (AoA), a semantic feature commonly shared by DNNs. The transferability of AoA is quite high. With no more than 10 queries of the decision only, AoA can achieve almost 100% success rate when attacking on many popular DNNs. Even without query, AoA could keep a surprisingly high attack performance. We apply AoA to generate 96020 adversarial samples from ImageNet to defeat many neural networks, and thus name the dataset as \emph{DAmageNet}. 20 well-trained DNNs are tested on DAmageNet. Without adversarial training, most of the tested DNNs have an error rate over 90%. DAmageNet is the first universal adversarial dataset and it could serve as a benchmark for robustness testing and adversarial training.
Tasks	Adversarial Attack
Published	2020-01-16
URL	https://arxiv.org/abs/2001.06325v1
PDF	https://arxiv.org/pdf/2001.06325v1.pdf
PWC	https://paperswithcode.com/paper/universal-adversarial-attack-on-attention-and
Repo
Framework

Differentiable Fixed-Point Iteration Layer


Title	Differentiable Fixed-Point Iteration Layer
Authors	Younahan Jeon, Minsik Lee, Jin Young Choi
Abstract	Recently, several studies proposed methods to utilize some restricted classes of optimization problems as layers of deep neural networks. However, these methods are still in their infancy and require special treatments, i.e., analyzing the KKT condition, etc., for deriving the backpropagation formula. Instead, in this paper, we propose a method to utilize fixed-point iteration (FPI), a generalization of many types of numerical algorithms, as a network layer. We show that the derivative of an FPI layer depends only on the fixed point, and then we present a method to calculate it efficiently using another FPI which we call the backward FPI. The proposed method can be easily implemented based on the autograd functionalities in existing deep learning tools. Since FPI covers vast different types of numerical algorithms in machine learning and other fields, it has a lot of potential applications. In the experiments, the differentiable FPI layer is applied to two scenarios, i.e., gradient descent iterations for differentiable optimization problems and FPI with arbitrary neural network modules, of which the results demonstrate the simplicity and the effectiveness.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02868v1
PDF	https://arxiv.org/pdf/2002.02868v1.pdf
PWC	https://paperswithcode.com/paper/differentiable-fixed-point-iteration-layer
Repo
Framework

On the Estimation of Information Measures of Continuous Distributions


Title	On the Estimation of Information Measures of Continuous Distributions
Authors	Georg Pichler, Pablo Piantanida, Günther Koliander
Abstract	The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02851v1
PDF	https://arxiv.org/pdf/2002.02851v1.pdf
PWC	https://paperswithcode.com/paper/on-the-estimation-of-information-measures-of
Repo
Framework

Distributional Reinforcement Learning with Ensembles


Title	Distributional Reinforcement Learning with Ensembles
Authors	Björn Lindenberg, Jonas Nordqvist, Karl-Olof Lindahl
Abstract	It is well-known that ensemble methods often provide enhanced performance in reinforcement learning. In this paper we explore this concept further by using group-aided training within the distributional reinforcement learning paradigm. Specifically, we propose an extension to categorical reinforcement learning, where distributional learning targets are implicitly based on the total information gathered by an ensemble. We empirically show that this may lead to much more robust initial learning, a stronger individual performance level and good efficiency on a per-sample basis.
Tasks	Distributional Reinforcement Learning
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10903v1
PDF	https://arxiv.org/pdf/2003.10903v1.pdf
PWC	https://paperswithcode.com/paper/distributional-reinforcement-learning-with-2
Repo
Framework

Millimeter Wave Communications with an Intelligent Reflector: Performance Optimization and Distributional Reinforcement Learning


Title	Millimeter Wave Communications with an Intelligent Reflector: Performance Optimization and Distributional Reinforcement Learning
Authors	Qianqian Zhang, Walid Saad, Mehdi Bennis
Abstract	In this paper, a novel framework is proposed to optimize the downlink multi-user communication of a millimeter wave base station, which is assisted by a reconfigurable intelligent reflector (IR). In particular, a channel estimation approach is developed to measure the channel state information (CSI) in real-time. First, for a perfect CSI scenario, the optimal precoding transmission and power allocation is derived so as to maximize the sum of downlink rates towards multiple users, followed by the optimization of IR reflection coefficient to enhance the upper bound of the downlink transmission. Next, in the imperfect CSI scenario, a distributional reinforcement learning (DRL) approach is proposed to learn the optimal IR reflection and maximize the expectation of downlink capacity. In order to model the transmission rate’s probability distribution, a learning algorithm, based on quantile regression (QR), is developed, and the proposed QR-DRL method is proved to converge to a stable distribution of downlink transmission rate. Simulation results show that, in the error-free CSI scenario, the proposed transmission approach yields over 20% and 2-fold increase in the downlink sum-rate, compared with a fixed IR reflection scheme and direct transmission scheme, respectively. Simulation results also show that by increasing the number of IR components, the downlink rate can be improved faster than by increasing the number of antennas at the BS. Furthermore, under limited knowledge of CSI, simulation results show that the proposed QR-DRL method, which learns a full distribution of the downlink rate, yields a better prediction accuracy and improves the downlink rate by 10% for online deployments, compared with a Q-learning baseline.
Tasks	Distributional Reinforcement Learning, Q-Learning
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10572v1
PDF	https://arxiv.org/pdf/2002.10572v1.pdf
PWC	https://paperswithcode.com/paper/millimeter-wave-communications-with-an
Repo
Framework

Assessing the Adversarial Robustness of Monte Carlo and Distillation Methods for Deep Bayesian Neural Network Classification


Title	Assessing the Adversarial Robustness of Monte Carlo and Distillation Methods for Deep Bayesian Neural Network Classification
Authors	Meet P. Vadera, Satya Narayan Shukla, Brian Jalaian, Benjamin M. Marlin
Abstract	In this paper, we consider the problem of assessing the adversarial robustness of deep neural network models under both Markov chain Monte Carlo (MCMC) and Bayesian Dark Knowledge (BDK) inference approximations. We characterize the robustness of each method to two types of adversarial attacks: the fast gradient sign method (FGSM) and projected gradient descent (PGD). We show that full MCMC-based inference has excellent robustness, significantly outperforming standard point estimation-based learning. On the other hand, BDK provides marginal improvements. As an additional contribution, we present a storage-efficient approach to computing adversarial examples for large Monte Carlo ensembles using both the FGSM and PGD attacks.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02842v1
PDF	https://arxiv.org/pdf/2002.02842v1.pdf
PWC	https://paperswithcode.com/paper/assessing-the-adversarial-robustness-of-monte
Repo
Framework

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling


Title	Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling
Authors	Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, In So Kweon
Abstract	Visual storytelling is a task of creating a short story based on photo streams. Unlike existing visual captioning, storytelling aims to contain not only factual descriptions, but also human-like narration and semantics. However, the VIST dataset consists only of a small, fixed number of photos per story. Therefore, the main challenge of visual storytelling is to fill in the visual gap between photos with narrative and imaginative story. In this paper, we propose to explicitly learn to imagine a storyline that bridges the visual gap. During training, one or more photos is randomly omitted from the input stack, and we train the network to produce a full plausible story even with missing photo(s). Furthermore, we propose for visual storytelling a hide-and-tell model, which is designed to learn non-local relations across the photo streams and to refine and improve conventional RNN-based models. In experiments, we show that our scheme of hide-and-tell, and the network design are indeed effective at storytelling, and that our model outperforms previous state-of-the-art methods in automatic metrics. Finally, we qualitatively show the learned ability to interpolate storyline over visual gaps.
Tasks	Image Captioning, Visual Storytelling
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00774v1
PDF	https://arxiv.org/pdf/2002.00774v1.pdf
PWC	https://paperswithcode.com/paper/hide-and-tell-learning-to-bridge-photo
Repo
Framework

UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning


Title	UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning
Authors	Quan Hoang Lam, Quang Duy Le, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
Abstract	Image Captioning, the task of automatic generation of image captions, has attracted attentions from researchers in many fields of computer science, being computer vision, natural language processing and machine learning in recent years. This paper contributes to research on Image Captioning task in terms of extending dataset to a different language - Vietnamese. So far, there is no existed Image Captioning dataset for Vietnamese language, so this is the foremost fundamental step for developing Vietnamese Image Captioning. In this scope, we first build a dataset which contains manually written captions for images from Microsoft COCO dataset relating to sports played with balls, we called this dataset UIT-ViIC. UIT-ViIC consists of 19,250 Vietnamese captions for 3,850 images. Following that, we evaluate our dataset on deep neural network models and do comparisons with English dataset and two Vietnamese datasets built by different methods. UIT-ViIC is published on our lab website for research purposes.
Tasks	Image Captioning
Published	2020-02-01
URL	https://arxiv.org/abs/2002.00175v1
PDF	https://arxiv.org/pdf/2002.00175v1.pdf
PWC	https://paperswithcode.com/paper/uit-viic-a-dataset-for-the-first-evaluation
Repo
Framework