Paper Group ANR 258
Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors. A Differential-form Pullback Programming Language for Higher-order Reverse-mode Automatic Differentiation. sKPNSGA-II: Knee point based MOEA with self-adaptive angle for Mission Planning Problems. On the Discrepancy between Density Estimation …
Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors
Title | Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors |
Authors | Zhaoqiang Liu, Selwyn Gomes, Avtansh Tiwari, Jonathan Scarlett |
Abstract | The goal of standard 1-bit compressive sensing is to accurately recover an unknown sparse vector from binary-valued measurements, each indicating the sign of a linear function of the vector. Motivated by recent advances in compressive sensing with generative models, where a generative modeling assumption replaces the usual sparsity assumption, we study the problem of 1-bit compressive sensing with generative models. We first consider noiseless 1-bit measurements, and provide sample complexity bounds for approximate recovery under i.i.d.~Gaussian measurements and a Lipschitz continuous generative prior, as well as a near-matching algorithm-independent lower bound. Moreover, we demonstrate that the Binary $\epsilon$-Stable Embedding property, which characterizes the robustness of the reconstruction to measurement errors and noise, also holds for 1-bit compressive sensing with Lipschitz continuous generative models with sufficiently many Gaussian measurements. In addition, we apply our results to neural network generative models, and provide a proof-of-concept numerical experiment demonstrating significant improvements over sparsity-based approaches. |
Tasks | Compressive Sensing |
Published | 2020-02-05 |
URL | https://arxiv.org/abs/2002.01697v2 |
https://arxiv.org/pdf/2002.01697v2.pdf | |
PWC | https://paperswithcode.com/paper/sample-complexity-bounds-for-1-bit |
Repo | |
Framework | |
A Differential-form Pullback Programming Language for Higher-order Reverse-mode Automatic Differentiation
Title | A Differential-form Pullback Programming Language for Higher-order Reverse-mode Automatic Differentiation |
Authors | Carol Mak, Luke Ong |
Abstract | Building on the observation that reverse-mode automatic differentiation (AD) – a generalisation of backpropagation – can naturally be expressed as pullbacks of differential 1-forms, we design a simple higher-order programming language with a first-class differential operator, and present a reduction strategy which exactly simulates reverse-mode AD. We justify our reduction strategy by interpreting our language in any differential $\lambda$-category that satisfies the Hahn-Banach Separation Theorem, and show that the reduction strategy precisely captures reverse-mode AD in a truly higher-order setting. |
Tasks | |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08241v1 |
https://arxiv.org/pdf/2002.08241v1.pdf | |
PWC | https://paperswithcode.com/paper/a-differential-form-pullback-programming |
Repo | |
Framework | |
sKPNSGA-II: Knee point based MOEA with self-adaptive angle for Mission Planning Problems
Title | sKPNSGA-II: Knee point based MOEA with self-adaptive angle for Mission Planning Problems |
Authors | Cristian Ramirez-Atencia, Sanaz Mostaghim, David Camacho |
Abstract | Real-world and complex problems have usually many objective functions that have to be optimized all at once. Over the last decades, Multi-Objective Evolutionary Algorithms (MOEAs) are designed to solve this kind of problems. Nevertheless, some problems have many objectives which lead to a large number of non-dominated solutions obtained by the optimization algorithms. The large set of non-dominated solutions hinders the selection of the most appropriate solution by the decision maker. This paper presents a new algorithm that has been designed to obtain the most significant solutions from the Pareto Optimal Frontier (POF). This approach is based on the cone-domination applied to MOEA, which can find the knee point solutions. In order to obtain the best cone angle, we propose a hypervolume-distribution metric, which is used to self-adapt the angle during the evolving process. This new algorithm has been applied to the real world application in Unmanned Air Vehicle (UAV) Mission Planning Problem. The experimental results show a significant improvement of the algorithm performance in terms of hypervolume, number of solutions, and also the required number of generations to converge. |
Tasks | |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08867v1 |
https://arxiv.org/pdf/2002.08867v1.pdf | |
PWC | https://paperswithcode.com/paper/skpnsga-ii-knee-point-based-moea-with-self |
Repo | |
Framework | |
On the Discrepancy between Density Estimation and Sequence Generation
Title | On the Discrepancy between Density Estimation and Sequence Generation |
Authors | Jason Lee, Dustin Tran, Orhan Firat, Kyunghyun Cho |
Abstract | Many sequence-to-sequence generation tasks, including machine translation and text-to-speech, can be posed as estimating the density of the output y given the input x: p(yx). Given this interpretation, it is natural to evaluate sequence-to-sequence models using conditional log-likelihood on a test set. However, the goal of sequence-to-sequence generation (or structured prediction) is to find the best output y^ given an input x, and each task has its own downstream metric R that scores a model output by comparing against a set of references y*: R(y^, y* x). While we hope that a model that excels in density estimation also performs well on the downstream metric, the exact correlation has not been studied for sequence generation tasks. In this paper, by comparing several density estimators on five machine translation tasks, we find that the correlation between rankings of models based on log-likelihood and BLEU varies significantly depending on the range of the model families being compared. First, log-likelihood is highly correlated with BLEU when we consider models within the same family (e.g. autoregressive models, or latent variable models with the same parameterization of the prior). However, we observe no correlation between rankings of models across different families: (1) among non-autoregressive latent variable models, a flexible prior distribution is better at density estimation but gives worse generation quality than a simple prior, and (2) autoregressive models offer the best translation performance overall, while latent variable models with a normalizing flow prior give the highest held-out log-likelihood across all datasets. Therefore, we recommend using a simple prior for the latent variable non-autoregressive model when fast generation speed is desired. |
Tasks | Density Estimation, Latent Variable Models, Machine Translation, Structured Prediction |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.07233v1 |
https://arxiv.org/pdf/2002.07233v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-discrepancy-between-density-estimation |
Repo | |
Framework | |
Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges
Title | Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges |
Authors | Triet H. M. Le, Hao Chen, M. Ali Babar |
Abstract | Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. To facilitate further research and applications of DL in this field, we provide a comprehensive review to categorize and investigate existing DL methods for source code modeling and generation. To address the limitations of the traditional source code models, we formulate common program learning tasks under an encoder-decoder framework. After that, we introduce recent DL mechanisms suitable to solve such problems. Then, we present the state-of-the-art practices and discuss their challenges with some recommendations for practitioners and researchers as well. |
Tasks | Language Modelling, Machine Translation |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05442v1 |
https://arxiv.org/pdf/2002.05442v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-source-code-modeling-and |
Repo | |
Framework | |
Learning Coupled Policies for Simultaneous Machine Translation
Title | Learning Coupled Policies for Simultaneous Machine Translation |
Authors | Philip Arthur, Trevor Cohn, Gholamreza Haffari |
Abstract | In simultaneous machine translation, the system needs to incrementally generate the output translation before the input sentence ends. This is a coupled decision process consisting of a programmer and interpreter. The programmer’s policy decides about when to WRITE the next output or READ the next input, and the interpreter’s policy decides what word to write. We present an imitation learning (IL) approach to efficiently learn effective coupled programmer-interpreter policies. To enable IL, we present an algorithmic oracle to produce oracle READ/WRITE actions for training bilingual sentence-pairs using the notion of word alignments. We attribute the effectiveness of the learned coupled policies to (i) scheduled sampling addressing the coupled exposure bias, and (ii) quality of oracle actions capturing enough information from the partial input before writing the output. Experiments show our method outperforms strong baselines in terms of translation quality and delay, when translating from German/Arabic/Czech/Bulgarian/Romanian to English. |
Tasks | Imitation Learning, Machine Translation |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04306v1 |
https://arxiv.org/pdf/2002.04306v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-coupled-policies-for-simultaneous |
Repo | |
Framework | |
Importance-Driven Deep Learning System Testing
Title | Importance-Driven Deep Learning System Testing |
Authors | Simos Gerasimou, Hasan Ferit Eniser, Alper Sen, Alper Cakan |
Abstract | Deep Learning (DL) systems are key enablers for engineering intelligent applications due to their ability to solve complex tasks such as image recognition and machine translation. Nevertheless, using DL systems in safety- and security-critical applications requires to provide testing evidence for their dependable operation. Recent research in this direction focuses on adapting testing criteria from traditional software engineering as a means of increasing confidence for their correct behaviour. However, they are inadequate in capturing the intrinsic properties exhibited by these systems. We bridge this gap by introducing DeepImportance, a systematic testing methodology accompanied by an Importance-Driven (IDC) test adequacy criterion for DL systems. Applying IDC enables to establish a layer-wise functional understanding of the importance of DL system components and use this information to assess the semantic diversity of a test set. Our empirical evaluation on several DL systems, across multiple DL datasets and with state-of-the-art adversarial generation techniques demonstrates the usefulness and effectiveness of DeepImportance and its ability to support the engineering of more robust DL systems. |
Tasks | Machine Translation |
Published | 2020-02-09 |
URL | https://arxiv.org/abs/2002.03433v1 |
https://arxiv.org/pdf/2002.03433v1.pdf | |
PWC | https://paperswithcode.com/paper/importance-driven-deep-learning-system |
Repo | |
Framework | |
FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA
Title | FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA |
Authors | Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, Farinaz Koushanfar |
Abstract | Autoregressive convolutional neural networks (CNNs) have been widely exploited for sequence generation tasks such as audio synthesis, language modeling and neural machine translation. WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution that is used for sequence generation. While WaveNet produces state-of-the art audio generation results, the naive inference implementation is quite slow; it takes a few minutes to generate just one second of audio on a high-end GPU. In this work, we develop the first accelerator platform~\textit{FastWave} for autoregressive convolutional neural networks, and address the associated design challenges. We design the Fast-Wavenet inference model in Vivado HLS and perform a wide range of optimizations including fixed-point implementation, array partitioning and pipelining. Our model uses a fully parameterized parallel architecture for fast matrix-vector multiplication that enables per-layer customized latency fine-tuning for further throughput improvement. Our experiments comparatively assess the trade-off between throughput and resource utilization for various optimizations. Our best WaveNet design on the Xilinx XCVU13P FPGA that uses only on-chip memory, achieves 66 faster generation speed compared to CPU implementation and 11 faster generation speed than GPU implementation. |
Tasks | Audio Generation, Language Modelling, Machine Translation |
Published | 2020-02-09 |
URL | https://arxiv.org/abs/2002.04971v1 |
https://arxiv.org/pdf/2002.04971v1.pdf | |
PWC | https://paperswithcode.com/paper/fastwave-accelerating-autoregressive |
Repo | |
Framework | |
Learning Light Field Angular Super-Resolution via a Geometry-Aware Network
Title | Learning Light Field Angular Super-Resolution via a Geometry-Aware Network |
Authors | Jing Jin, Junhui Hou, Hui Yuan, Sam Kwong |
Abstract | The acquisition of light field images with high angular resolution is costly. Although many methods have been proposed to improve the angular resolution of a sparsely-sampled light field, they always focus on the light field with a small baseline, which is captured by a consumer light field camera. By making full use of the intrinsic \textit{geometry} information of light fields, in this paper we propose an end-to-end learning-based approach aiming at angularly super-resolving a sparsely-sampled light field with a large baseline. Our model consists of two learnable modules and a physically-based module. Specifically, it includes a depth estimation module for explicitly modeling the scene geometry, a physically-based warping for novel views synthesis, and a light field blending module specifically designed for light field reconstruction. Moreover, we introduce a novel loss function to promote the preservation of the light field parallax structure. Experimental results over various light field datasets including large baseline light field images demonstrate the significant superiority of our method when compared with state-of-the-art ones, i.e., our method improves the PSNR of the second best method up to 2 dB in average, while saves the execution time 48$\times$. In addition, our method preserves the light field parallax structure better. |
Tasks | Depth Estimation, Super-Resolution |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11263v1 |
https://arxiv.org/pdf/2002.11263v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-light-field-angular-super-resolution |
Repo | |
Framework | |
Rhythm, Chord and Melody Generation for Lead Sheets using Recurrent Neural Networks
Title | Rhythm, Chord and Melody Generation for Lead Sheets using Recurrent Neural Networks |
Authors | Cedric De Boom, Stephanie Van Laere, Tim Verbelen, Bart Dhoedt |
Abstract | Music that is generated by recurrent neural networks often lacks a sense of direction and coherence. We therefore propose a two-stage LSTM-based model for lead sheet generation, in which the harmonic and rhythmic templates of the song are produced first, after which, in a second stage, a sequence of melody notes is generated conditioned on these templates. A subjective listening test shows that our approach outperforms the baselines and increases perceived musical coherence. |
Tasks | |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.10266v1 |
https://arxiv.org/pdf/2002.10266v1.pdf | |
PWC | https://paperswithcode.com/paper/rhythm-chord-and-melody-generation-for-lead |
Repo | |
Framework | |
A Provably Robust Multiple Rotation Averaging Scheme for SO(2)
Title | A Provably Robust Multiple Rotation Averaging Scheme for SO(2) |
Authors | Tyler Maunu, Gilad Lerman |
Abstract | We give adversarial robustness results for synchronization on the rotation group over $\mathbb{R}^2$, $\mathrm{SO}(2)$. In particular, we consider an adversarial corruption setting, where an adversary can choose which measurements to corrupt as well as what to corrupt them to. In this setting, we first show that some common nonconvex formulations, which are categorized as “multiple rotation averaging”, may fail. We then discuss a new fast algorithm, called Trimmed Averaging Synchronization, which has exact recovery and linear convergence up to an outlier percentage of $1/4$. |
Tasks | |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05299v1 |
https://arxiv.org/pdf/2002.05299v1.pdf | |
PWC | https://paperswithcode.com/paper/a-provably-robust-multiple-rotation-averaging |
Repo | |
Framework | |
A sequential resource investment planning framework using reinforcement learning and simulation-based optimization: A case study on microgrid storage expansion
Title | A sequential resource investment planning framework using reinforcement learning and simulation-based optimization: A case study on microgrid storage expansion |
Authors | S. Tsianikas, N. Yousefi, J. Zhou, M. Rodgers, D. W. Coit |
Abstract | A model and expansion plan have been developed to optimally determine microgrid designs as they evolve to dynamically react to changing conditions and to exploit energy storage capabilities. In the wake of the highly electrified future ahead of us, the role of energy storage is crucial wherever distributed generation is abundant, such as microgrid settings. Given the variety of storage options that are recently becoming more economical, determining which type of storage technology to invest in, along with the appropriate timing and capacity becomes a critical research question. In problems where the investment timing is of high priority, like this one, developing analytical and systematic frameworks for rigorously considering these issues is indispensable. From a business perspective, these strategic frameworks will aim to optimize the process of investment planning, by leveraging novel approaches and by capturing all the problem details that traditional approaches are unable to. Reinforcement learning algorithms have recently proven to be successful in problems where sequential decision-making is inherent. In the operations planning area, these algorithms are already used but mostly in short-term problems with well-defined constraints and low levels of uncertainty modeling. On the contrary, in this work, we expand and tailor these techniques to long-term investment planning by utilizing model-free approaches, like the Q-learning algorithm, combined with simulation-based models. We find that specific types of energy storage units, including the vanadium-redox battery, can be expected to be at the core of the future microgrid applications, and therefore, require further attention. Another key finding is that the optimal storage capacity threshold for a system depends heavily on the price movements of the available storage units in the market. |
Tasks | Decision Making, Q-Learning |
Published | 2020-01-10 |
URL | https://arxiv.org/abs/2001.03507v1 |
https://arxiv.org/pdf/2001.03507v1.pdf | |
PWC | https://paperswithcode.com/paper/a-sequential-resource-investment-planning |
Repo | |
Framework | |
Hardware Architecture Proposal for TEDA algorithm to Data Streaming Anomaly Detection
Title | Hardware Architecture Proposal for TEDA algorithm to Data Streaming Anomaly Detection |
Authors | Lucileide M. D. da Silva, Maria G. F. Coutinho, Carlos E. B. Santos, Mailson R. Santos, Luiz Affonso Guedes, M. Dolores Ruiz, Marcelo A. C. Fernandes |
Abstract | The amount of data in real-time, such as time series and streaming data, available today continues to grow. Being able to analyze this data the moment it arrives can bring an immense added value. However, it also requires a lot of computational effort and new acceleration techniques. As a possible solution to this problem, this paper proposes a hardware architecture for Typicality and Eccentricity Data Analytic (TEDA) algorithm implemented on Field Programmable Gate Arrays (FPGA) for use in data streaming anomaly detection. TEDA is based on a new approach to outlier detection in the data stream context. In order to validate the proposals, results of the occupation and throughput of the proposed hardware are presented. Besides, the bit accurate simulation results are also presented. The project aims to Xilinx Virtex-6 xc6vlx240t-1ff1156 as the target FPGA. |
Tasks | Anomaly Detection, Outlier Detection, Time Series |
Published | 2020-03-08 |
URL | https://arxiv.org/abs/2003.03837v1 |
https://arxiv.org/pdf/2003.03837v1.pdf | |
PWC | https://paperswithcode.com/paper/hardware-architecture-proposal-for-teda |
Repo | |
Framework | |
HMANet: Hybrid Multiple Attention Network for Semantic Segmentation in Aerial Images
Title | HMANet: Hybrid Multiple Attention Network for Semantic Segmentation in Aerial Images |
Authors | Ruigang Niu |
Abstract | Semantic segmentation in very high resolution (VHR) aerial images is one of the most challenging tasks in remote sensing image understanding. Most of the current approaches are based on deep convolutional neural networks (DCNNs) for its remarkable ability of feature representations. Specifically, attention-based methods can effectively capture long-range dependencies and further reconstruct the feature maps for better representation. However, limited by the mere perspective of spacial and channel attention and huge computation complexity of self-attention mechanism, it’s unlikely to model the effective semantic interdependencies between each pixel-pair. In this work, we propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations from the perspective of space, channel and category in a more effective and efficient manner. Concretely, a class augmented attention (CAA) module embedded with a class channel attention (CCA) module can be used to compute category-based correlation and recalibrate the class-level information. Additionally, we introduce a simple yet region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism via region-wise representations. Extensive experimental results on the ISPRS Vaihingen and Potsdam benchmark demonstrate the effectiveness and efficiency of our HMANet over other state-of-the-art methods. |
Tasks | Semantic Segmentation |
Published | 2020-01-09 |
URL | https://arxiv.org/abs/2001.02870v1 |
https://arxiv.org/pdf/2001.02870v1.pdf | |
PWC | https://paperswithcode.com/paper/hmanet-hybrid-multiple-attention-network-for |
Repo | |
Framework | |
Learning Algebraic Multigrid Using Graph Neural Networks
Title | Learning Algebraic Multigrid Using Graph Neural Networks |
Authors | Ilay Luz, Meirav Galun, Haggai Maron, Ronen Basri, Irad Yavneh |
Abstract | Efficient numerical solvers for sparse linear systems are crucial in science and engineering. One of the fastest methods for solving large-scale sparse linear systems is algebraic multigrid (AMG). The main challenge in the construction of AMG algorithms is the selection of the prolongation operator – a problem-dependent sparse matrix which governs the multiscale hierarchy of the solver and is critical to its efficiency. Over many years, numerous methods have been developed for this task, and yet there is no known single right answer except in very special cases. Here we propose a framework for learning AMG prolongation operators for linear systems with sparse symmetric positive (semi-) definite matrices. We train a single graph neural network to learn a mapping from an entire class of such matrices to prolongation operators, using an efficient unsupervised loss function. Experiments on a broad class of problems demonstrate improved convergence rates compared to classical AMG, demonstrating the potential utility of neural networks for developing sparse system solvers. |
Tasks | |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.05744v1 |
https://arxiv.org/pdf/2003.05744v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-algebraic-multigrid-using-graph |
Repo | |
Framework | |