April 3, 2020

3203 words 16 mins read

Paper Group ANR 49

Paper Group ANR 49

Large Scale Many-Objective Optimization Driven by Distributional Adversarial Networks. Text-based inference of moral sentiment change. Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning. Non-Asymptotic Bounds for Zeroth-Order Stochastic Optimization. Human-to-Robot Attention Transfer for Robot Execution F …

Large Scale Many-Objective Optimization Driven by Distributional Adversarial Networks

Title Large Scale Many-Objective Optimization Driven by Distributional Adversarial Networks
Authors Zhenyu Liang, Yunfan Li, Zhongwei Wan
Abstract Estimation of distribution algorithms (EDA) as one of the EAs is a stochastic optimization problem which establishes a probability model to describe the distribution of solutions and randomly samples the probability model to create offspring and optimize model and population. Reference Vector Guided Evolutionary (RVEA) based on the EDA framework, having a better performance to solve MaOPs. Besides, using the generative adversarial networks to generate offspring solutions is also a state-of-art thought in EAs instead of crossover and mutation. In this paper, we will propose a novel algorithm based on RVEA[1] framework and using Distributional Adversarial Networks (DAN) [2]to generate new offspring. DAN uses a new distributional framework for adversarial training of neural networks and operates on genuine samples rather than a single point because the framework also leads to more stable training and extraordinarily better mode coverage compared to single-point-sample methods. Thereby, DAN can quickly generate offspring with high convergence regarding the same distribution of data. In addition, we also use Large-Scale Multi-Objective Optimization Based on A Competitive Swarm Optimizer (LMOCSO)[3] to adopts a new two-stage strategy to update the position in order to significantly increase the search efficiency to find optimal solutions in huge decision space. The propose new algorithm will be tested on 9 benchmark problems in Large scale multi-objective problems (LSMOP). To measure the performance, we will compare our proposal algorithm with some state-of-art EAs e.g., RM-MEDA[4], MO-CMA[10] and NSGA-II.
Tasks Stochastic Optimization
Published 2020-03-16
URL https://arxiv.org/abs/2003.07013v1
PDF https://arxiv.org/pdf/2003.07013v1.pdf
PWC https://paperswithcode.com/paper/large-scale-many-objective-optimization

Text-based inference of moral sentiment change

Title Text-based inference of moral sentiment change
Authors Jing Yi Xie, Renato Ferreira Pinto Jr., Graeme Hirst, Yang Xu
Abstract We present a text-based framework for investigating moral sentiment change of the public via longitudinal corpora. Our framework is based on the premise that language use can inform people’s moral perception toward right or wrong, and we build our methodology by exploring moral biases learned from diachronic word embeddings. We demonstrate how a parameter-free model supports inference of historical shifts in moral sentiment toward concepts such as slavery and democracy over centuries at three incremental levels: moral relevance, moral polarity, and fine-grained moral dimensions. We apply this methodology to visualizing moral time courses of individual concepts and analyzing the relations between psycholinguistic variables and rates of moral sentiment change at scale. Our work offers opportunities for applying natural language processing toward characterizing moral sentiment change in society.
Tasks Word Embeddings
Published 2020-01-20
URL https://arxiv.org/abs/2001.07209v1
PDF https://arxiv.org/pdf/2001.07209v1.pdf
PWC https://paperswithcode.com/paper/text-based-inference-of-moral-sentiment-1

Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning

Title Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning
Authors Yu Kobayashi, Hideaki Iiduka
Abstract This paper proposes a conjugate-gradient-based Adam algorithm blending Adam with nonlinear conjugate gradient methods and shows its convergence analysis. Numerical experiments on text classification and image classification show that the proposed algorithm can train deep neural network models in fewer epochs than the existing adaptive stochastic optimization algorithms can.
Tasks Image Classification, Stochastic Optimization, Text Classification
Published 2020-02-29
URL https://arxiv.org/abs/2003.00231v2
PDF https://arxiv.org/pdf/2003.00231v2.pdf
PWC https://paperswithcode.com/paper/conjugate-gradient-based-adam-for-stochastic

Non-Asymptotic Bounds for Zeroth-Order Stochastic Optimization

Title Non-Asymptotic Bounds for Zeroth-Order Stochastic Optimization
Authors Nirav Bhavsar, Prashanth L A
Abstract We consider the problem of optimizing an objective function with and without convexity in a simulation-optimization context, where only stochastic zeroth-order information is available. We consider two techniques for estimating gradient/Hessian, namely simultaneous perturbation (SP) and Gaussian smoothing (GS). We introduce an optimization oracle to capture a setting where the function measurements have an estimation error that can be controlled. Our oracle is appealing in several practical contexts where the objective has to be estimated from i.i.d. samples, and increasing the number of samples reduces the estimation error. In the stochastic non-convex optimization context, we analyze the zeroth-order variant of the randomized stochastic gradient (RSG) and quasi-Newton (RSQN) algorithms with a biased gradient/Hessian oracle, and with its variant involving an estimation error component. In particular, we provide non-asymptotic bounds on the performance of both algorithms, and our results provide a guideline for choosing the batch size for estimation, so that the overall error bound matches with the one obtained when there is no estimation error. Next, in the stochastic convex optimization setting, we provide non-asymptotic bounds that hold in expectation for the last iterate of a stochastic gradient descent (SGD) algorithm, and our bound for the GS variant of SGD matches the bound for SGD with unbiased gradient information. We perform simulation experiments on synthetic as well as real-world datasets, and the empirical results validate the theoretical findings.
Tasks Stochastic Optimization
Published 2020-02-26
URL https://arxiv.org/abs/2002.11440v1
PDF https://arxiv.org/pdf/2002.11440v1.pdf
PWC https://paperswithcode.com/paper/non-asymptotic-bounds-for-zeroth-order

Human-to-Robot Attention Transfer for Robot Execution Failure Avoidance Using Stacked Neural Networks

Title Human-to-Robot Attention Transfer for Robot Execution Failure Avoidance Using Stacked Neural Networks
Authors Boyi Song, Yuntao Peng, Ruijiao Luo, Rui Liu
Abstract Due to world dynamics and hardware uncertainty, robots inevitably fail in task executions, leading to undesired or even dangerous executions. To avoid failures for improved robot performance, it is critical to identify and correct robot abnormal executions in an early stage. However, limited by reasoning capability and knowledge level, it is challenging for a robot to self diagnose and correct their abnormal behaviors. To solve this problem, a novel method is proposed, human-to-robot attention transfer (H2R-AT) to seek help from a human. H2R-AT is developed based on a novel stacked neural networks model, transferring human attention embedded in verbal reminders to robot attention embedded in robot visual perceiving. With the attention transfer from a human, a robot understands what and where human concerns are to identify and correct its abnormal executions. To validate the effectiveness of H2R-AT, two representative task scenarios, “serve water for a human in a kitchen” and “pick up a defective gear in a factory” with abnormal robot executions, were designed in an open-access simulation platform V-REP; $252$ volunteers were recruited to provide about 12000 verbal reminders to learn and test the attention transfer model H2R-AT. With an accuracy of $73.68%$ in transferring attention and accuracy of $66.86%$ in avoiding robot execution failures, the effectiveness of H2R-AT was validated.
Published 2020-02-11
URL https://arxiv.org/abs/2002.04242v1
PDF https://arxiv.org/pdf/2002.04242v1.pdf
PWC https://paperswithcode.com/paper/human-to-robot-attention-transfer-for-robot

A deep-learning view of chemical space designed to facilitate drug discovery

Title A deep-learning view of chemical space designed to facilitate drug discovery
Authors Paul Maragakis, Hunter Nisonoff, Brian Cole, David E. Shaw
Abstract Drug discovery projects entail cycles of design, synthesis, and testing that yield a series of chemically related small molecules whose properties, such as binding affinity to a given target protein, are progressively tailored to a particular drug discovery goal. The use of deep learning technologies could augment the typical practice of using human intuition in the design cycle, and thereby expedite drug discovery projects. Here we present DESMILES, a deep neural network model that advances the state of the art in machine learning approaches to molecular design. We applied DESMILES to a previously published benchmark that assesses the ability of a method to modify input molecules to inhibit the dopamine receptor D2, and DESMILES yielded a 77% lower failure rate compared to state-of-the-art models. To explain the ability of DESMILES to hone molecular properties, we visualize a layer of the DESMILES network, and further demonstrate this ability by using DESMILES to tailor the same molecules used in the D2 benchmark test to dock more potently against seven different receptors.
Tasks Drug Discovery
Published 2020-02-07
URL https://arxiv.org/abs/2002.02948v1
PDF https://arxiv.org/pdf/2002.02948v1.pdf
PWC https://paperswithcode.com/paper/a-deep-learning-view-of-chemical-space

Optimizing Geometry Compression using Quantum Annealing

Title Optimizing Geometry Compression using Quantum Annealing
Authors Sebastian Feld, Markus Friedrich, Claudia Linnhoff-Popien
Abstract The compression of geometry data is an important aspect of bandwidth-efficient data transfer for distributed 3d computer vision applications. We propose a quantum-enabled lossy 3d point cloud compression pipeline based on the constructive solid geometry (CSG) model representation. Key parts of the pipeline are mapped to NP-complete problems for which an efficient Ising formulation suitable for the execution on a Quantum Annealer exists. We describe existing Ising formulations for the maximum clique search problem and the smallest exact cover problem, both of which are important building blocks of the proposed compression pipeline. Additionally, we discuss the properties of the overall pipeline regarding result optimality and described Ising formulations.
Published 2020-03-30
URL https://arxiv.org/abs/2003.13253v1
PDF https://arxiv.org/pdf/2003.13253v1.pdf
PWC https://paperswithcode.com/paper/optimizing-geometry-compression-using-quantum

Semi-supervised Disentanglement with Independent Vector Variational Autoencoders

Title Semi-supervised Disentanglement with Independent Vector Variational Autoencoders
Authors Bo-Kyeong Kim, Sungjin Park, Geonmin Kim, Soo-Young Lee
Abstract We aim to separate the generative factors of data into two latent vectors in a variational autoencoder. One vector captures class factors relevant to target classification tasks, while the other vector captures style factors relevant to the remaining information. To learn the discrete class features, we introduce supervision using a small amount of labeled data, which can simply yet effectively reduce the effort required for hyperparameter tuning performed in existing unsupervised methods. Furthermore, we introduce a learning objective to encourage statistical independence between the vectors. We show that (i) this vector independence term exists within the result obtained on decomposing the evidence lower bound with multiple latent vectors, and (ii) encouraging such independence along with reducing the total correlation within the vectors enhances disentanglement performance. Experiments conducted on several image datasets demonstrate that the disentanglement achieved via our method can improve classification performance and generation controllability.
Published 2020-03-14
URL https://arxiv.org/abs/2003.06581v1
PDF https://arxiv.org/pdf/2003.06581v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-disentanglement-with

Convergence analysis of particle swarm optimization using stochastic Lyapunov functions and quantifier elimination

Title Convergence analysis of particle swarm optimization using stochastic Lyapunov functions and quantifier elimination
Authors Maximilian Gerwien, Rick Voßwinkel, Hendrik Richter
Abstract This paper adds to the discussion about theoretical aspects of particle swarm stability by proposing to employ stochastic Lyapunov functions and to determine the convergence set by quantifier elimination. We present a computational procedure and show that this approach leads to reevaluation and extension of previously know stability regions for PSO using a Lyapunov approach under stagnation assumptions.
Published 2020-02-05
URL https://arxiv.org/abs/2002.01673v1
PDF https://arxiv.org/pdf/2002.01673v1.pdf
PWC https://paperswithcode.com/paper/convergence-analysis-of-particle-swarm

Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions

Title Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions
Authors Toni Karvonen, George Wynne, Filip Tronarp, Chris J. Oates, Simo Särkkä
Abstract Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the dataset. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset. Specifically, we consider the scenario where the scale parameter of a Sobolev kernel (such as a Mat'ern kernel) is estimated by maximum likelihood. We show that the maximum likelihood estimation of the scale parameter alone provides significant adaptation against misspecification of the Gaussian process model in the sense that the model can become “slowly” overconfident at worst, regardless of the difference between the smoothness of the data-generating function and that expected by the model. The analysis is based on a combination of techniques from nonparametric regression and scattered data interpolation. Empirical results are provided in support of the theoretical findings.
Published 2020-01-29
URL https://arxiv.org/abs/2001.10965v2
PDF https://arxiv.org/pdf/2001.10965v2.pdf
PWC https://paperswithcode.com/paper/maximum-likelihood-estimation-and-uncertainty

Estimation of Z-Thickness and XY-Anisotropy of Electron Microscopy Images using Gaussian Processes

Title Estimation of Z-Thickness and XY-Anisotropy of Electron Microscopy Images using Gaussian Processes
Authors Thanuja D. Ambegoda, Julien N. P. Martel, Jozef Adamcik, Matthew Cook, Richard H. R. Hahnloser
Abstract Serial section electron microscopy (ssEM) is a widely used technique for obtaining volumetric information of biological tissues at nanometer scale. However, accurate 3D reconstructions of identified cellular structures and volumetric quantifications require precise estimates of section thickness and anisotropy (or stretching) along the XY imaging plane. In fact, many image processing algorithms simply assume isotropy within the imaging plane. To ameliorate this problem, we present a method for estimating thickness and stretching of electron microscopy sections using non-parametric Bayesian regression of image statistics. We verify our thickness and stretching estimates using direct measurements obtained by atomic force microscopy (AFM) and show that our method has a lower estimation error compared to a recent indirect thickness estimation method as well as a relative Z coordinate estimation method. Furthermore, we have made the first dataset of ssSEM images with directly measured section thickness values publicly available for the evaluation of indirect thickness estimation methods.
Tasks Gaussian Processes
Published 2020-02-01
URL https://arxiv.org/abs/2002.00228v2
PDF https://arxiv.org/pdf/2002.00228v2.pdf
PWC https://paperswithcode.com/paper/estimation-of-z-thickness-and-xy-anisotropy

Dimensionality Reduction and Motion Clustering during Activities of Daily Living: 3, 4, and 7 Degree-of-Freedom Arm Movements

Title Dimensionality Reduction and Motion Clustering during Activities of Daily Living: 3, 4, and 7 Degree-of-Freedom Arm Movements
Authors Yuri Gloumakov, Adam J. Spiers, Aaron M. Dollar
Abstract The wide variety of motions performed by the human arm during daily tasks makes it desirable to find representative subsets to reduce the dimensionality of these movements for a variety of applications, including the design and control of robotic and prosthetic devices. This paper presents a novel method and the results of an extensive human subjects study to obtain representative arm joint angle trajectories that span naturalistic motions during Activities of Daily Living (ADLs). In particular, we seek to identify sets of useful motion trajectories of the upper limb that are functions of a single variable, allowing, for instance, an entire prosthetic or robotic arm to be controlled with a single input from a user, along with a means to select between motions for different tasks. Data driven approaches are used to obtain clusters as well as representative motion averages for the full-arm 7 degree of freedom (DOF), elbow-wrist 4 DOF, and wrist-only 3 DOF motions. The proposed method makes use of well-known techniques such as dynamic time warping (DTW) to obtain a divergence measure between motion segments, DTW barycenter averaging (DBA) to obtain averages, Ward’s distance criterion to build hierarchical trees, batch-DTW to simultaneously align multiple motion data, and functional principal component analysis (fPCA) to evaluate cluster variability. The clusters that emerge associate various recorded motions into primarily hand start and end location for the full-arm system, motion direction for the wrist-only system, and an intermediate between the two qualities for the elbow-wrist system. The proposed clustering methodology is justified by comparing results against alternative approaches.
Tasks Dimensionality Reduction
Published 2020-02-17
URL https://arxiv.org/abs/2003.02641v1
PDF https://arxiv.org/pdf/2003.02641v1.pdf
PWC https://paperswithcode.com/paper/dimensionality-reduction-and-motion

Activation Density driven Energy-Efficient Pruning in Training

Title Activation Density driven Energy-Efficient Pruning in Training
Authors Timothy Foldy-Porto, Priyadarshini Panda
Abstract The process of neural network pruning with suitable fine-tuning and retraining can yield networks with considerably fewer parameters than the original with comparable degrees of accuracy. Typically, pruning methods require large, pre-trained networks as a starting point from which they perform a time-intensive iterative pruning and retraining algorithm. We propose a novel pruning in-training method that prunes a network real-time during training, reducing the overall training time to achieve an optimal compressed network. To do so, we introduce an activation density based analysis that identifies the optimal relative sizing or compression for each layer of the network. Our method removes the need for pre-training and is architecture agnostic, allowing it to be employed on a wide variety of systems. For VGG-19 and ResNet18 on CIFAR-10, CIFAR-100, and TinyImageNet, we obtain exceedingly sparse networks (up to 200x reduction in parameters and >60x reduction in inference compute operations in the best case) with comparable accuracies (up to 2%-3% loss with respect to the baseline network). By reducing the network size periodically during training, we achieve total training times that are shorter than those of previously proposed pruning methods. Furthermore, training compressed networks at different epochs with our proposed method yields considerable reduction in training compute complexity (1.6x -3.2x lower) at near iso-accuracy as compared to a baseline network trained entirely from scratch.
Tasks Network Pruning
Published 2020-02-07
URL https://arxiv.org/abs/2002.02949v1
PDF https://arxiv.org/pdf/2002.02949v1.pdf
PWC https://paperswithcode.com/paper/activation-density-driven-energy-efficient

PointINS: Point-based Instance Segmentation

Title PointINS: Point-based Instance Segmentation
Authors Lu Qi, Xiangyu Zhang, Yingcong Chen, Yukang Chen, Jian Sun, Jiaya Jia
Abstract A single-point feature has shown its effectiveness in object detection. However, for instance segmentation, it does not lead to satisfactory results. The reasons are two folds. Firstly, it has limited representation capacity. Secondly, it could be misaligned with potential instances. To address the above issues, we propose a new point-based framework, namely PointINS, to segment instances from single points. The core module of our framework is instance-aware convolution, including the instance-agnostic feature and instance-aware weights. Instance-agnostic feature for each Point-of-Interest (PoI) serves as a template for potential instance masks. In this way, instance-aware features are computed by convolving this template with instance-aware weights for following mask prediction. Given the independence of instance-aware convolution, PointINS is general and practical as a one-stage detector for anchor-based and anchor-free frameworks. In our extensive experiments, we show the effectiveness of our framework on RetinaNet and FCOS. With ResNet101 backbone, PointINS achieves 38.3 mask mAP on challenging COCO dataset, outperforming its competitors by a large margin. The code will be made publicly available.
Tasks Instance Segmentation, Object Detection, Semantic Segmentation
Published 2020-03-13
URL https://arxiv.org/abs/2003.06148v1
PDF https://arxiv.org/pdf/2003.06148v1.pdf
PWC https://paperswithcode.com/paper/pointins-point-based-instance-segmentation

Pruning CNN’s with linear filter ensembles

Title Pruning CNN’s with linear filter ensembles
Authors Csanád Sándor, Szabolcs Pável, Lehel Csató
Abstract Despite the promising results of convolutional neural networks (CNNs), their application on devices with limited resources is still a big challenge; this is mainly due to the huge memory and computation requirements of the CNN. To counter the limitation imposed by the network size, we use pruning to reduce the network size and – implicitly – the number of floating point operations (FLOPs). Contrary to the filter norm method – used in conventional network pruning – based on the assumption that a smaller norm implies ``less importance’’ to its associated component, we develop a novel filter importance norm that is based on the change in the empirical loss caused by the presence or removal of a component from the network architecture. Since there are too many individual possibilities for filter configuration, we repeatedly sample from these architectural components and measure the system performance in the respective state of components being active or disabled. The result is a collection of filter ensembles – filter masks – and associated performance values. We rank the filters based on a linear and additive model and remove the least important ones such that the drop in network accuracy is minimal. We evaluate our method on a fully connected network, as well as on the ResNet architecture trained on the CIFAR-10 dataset. Using our pruning method, we managed to remove $60%$ of the parameters and $64%$ of the FLOPs from the ResNet with an accuracy drop of less than $0.6%$. |
Tasks Network Pruning
Published 2020-01-22
URL https://arxiv.org/abs/2001.08142v2
PDF https://arxiv.org/pdf/2001.08142v2.pdf
PWC https://paperswithcode.com/paper/pruning-cnns-with-linear-filter-ensembles
comments powered by Disqus