October 21, 2019

2812 words 14 mins read

Paper Group AWR 40

Paper Group AWR 40

The NES Music Database: A multi-instrumental dataset with expressive performance attributes. SING: Symbol-to-Instrument Neural Generator. Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network. Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3. Single Image Reflection Removal Using …

The NES Music Database: A multi-instrumental dataset with expressive performance attributes

Title The NES Music Database: A multi-instrumental dataset with expressive performance attributes
Authors Chris Donahue, Huanru Henry Mao, Julian McAuley
Abstract Existing research on music generation focuses on composition, but often ignores the expressive performance characteristics required for plausible renditions of resultant pieces. In this paper, we introduce the Nintendo Entertainment System Music Database (NES-MDB), a large corpus allowing for separate examination of the tasks of composition and performance. NES-MDB contains thousands of multi-instrumental songs composed for playback by the compositionally-constrained NES audio synthesizer. For each song, the dataset contains a musical score for four instrument voices as well as expressive attributes for the dynamics and timbre of each voice. Unlike datasets comprised of General MIDI files, NES-MDB includes all of the information needed to render exact acoustic performances of the original compositions. Alongside the dataset, we provide a tool that renders generated compositions as NES-style audio by emulating the device’s audio processor. Additionally, we establish baselines for the tasks of composition, which consists of learning the semantics of composing for the NES synthesizer, and performance, which involves finding a mapping between a composition and realistic expressive attributes.
Tasks Music Generation
Published 2018-06-12
URL http://arxiv.org/abs/1806.04278v1
PDF http://arxiv.org/pdf/1806.04278v1.pdf
PWC https://paperswithcode.com/paper/the-nes-music-database-a-multi-instrumental
Repo https://github.com/chrisdonahue/nesmdb
Framework none

SING: Symbol-to-Instrument Neural Generator

Title SING: Symbol-to-Instrument Neural Generator
Authors Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach
Abstract Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN suffer from prohibitive training and inference times because they are based on autoregressive models that generate audio samples one at a time at a rate of 16kHz. In this work, we study the more computationally efficient alternative of generating the waveform frame-by-frame with large strides. We present SING, a lightweight neural audio synthesizer for the original task of generating musical notes given desired instrument, pitch and velocity. Our model is trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms. On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.
Tasks Music Generation
Published 2018-10-23
URL http://arxiv.org/abs/1810.09785v1
PDF http://arxiv.org/pdf/1810.09785v1.pdf
PWC https://paperswithcode.com/paper/sing-symbol-to-instrument-neural-generator
Repo https://github.com/facebookresearch/SING
Framework pytorch

Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network

Title Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network
Authors Hao-Min Liu, Yi-Hsuan Yang
Abstract Research on automatic music generation has seen great progress due to the development of deep neural networks. However, the generation of multi-instrument music of arbitrary genres still remains a challenge. Existing research either works on lead sheets or multi-track piano-rolls found in MIDIs, but both musical notations have their limits. In this work, we propose a new task called lead sheet arrangement to avoid such limits. A new recurrent convolutional generative model for the task is proposed, along with three new symbolic-domain harmonic features to facilitate learning from unpaired lead sheets and MIDIs. Our model can generate lead sheets and their arrangements of eight-bar long. Audio samples of the generated result can be found at https://drive.google.com/open?id=1c0FfODTpudmLvuKBbc23VBCgQizY6-Rk
Tasks Music Generation
Published 2018-07-30
URL http://arxiv.org/abs/1807.11161v1
PDF http://arxiv.org/pdf/1807.11161v1.pdf
PWC https://paperswithcode.com/paper/lead-sheet-generation-and-arrangement-by
Repo https://github.com/liuhaumin/leadsheetgan
Framework tf

Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3

Title Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3
Authors Bilel Benjdira, Taha Khursheed, Anis Koubaa, Adel Ammar, Kais Ouni
Abstract Unmanned Aerial Vehicles are increasingly being used in surveillance and traffic monitoring thanks to their high mobility and ability to cover areas at different altitudes and locations. One of the major challenges is to use aerial images to accurately detect cars and count them in real-time for traffic monitoring purposes. Several deep learning techniques were recently proposed based on convolution neural network (CNN) for real-time classification and recognition in computer vision. However, their performance depends on the scenarios where they are used. In this paper, we investigate the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN and YOLOv3, in the context of car detection from aerial images. We trained and tested these two models on a large car dataset taken from UAVs. We demonstrated in this paper that YOLOv3 outperforms Faster R-CNN in sensitivity and processing time, although they are comparable in the precision metric.
Tasks
Published 2018-12-28
URL http://arxiv.org/abs/1812.10968v1
PDF http://arxiv.org/pdf/1812.10968v1.pdf
PWC https://paperswithcode.com/paper/car-detection-using-unmanned-aerial-vehicles
Repo https://github.com/aniskoubaa/car_detection_yolo_faster_rcnn_uvsc2019
Framework tf

Single Image Reflection Removal Using Deep Encoder-Decoder Network

Title Single Image Reflection Removal Using Deep Encoder-Decoder Network
Authors Zhixiang Chi, Xiaolin Wu, Xiao Shu, Jinjin Gu
Abstract Image of a scene captured through a piece of transparent and reflective material, such as glass, is often spoiled by a superimposed layer of reflection image. While separating the reflection from a familiar object in an image is mentally not difficult for humans, it is a challenging, ill-posed problem in computer vision. In this paper, we propose a novel deep convolutional encoder-decoder method to remove the objectionable reflection by learning a map between image pairs with and without reflection. For training the neural network, we model the physical formation of reflections in images and synthesize a large number of photo-realistic reflection-tainted images from reflection-free images collected online. Extensive experimental results show that, although the neural network learns only from synthetic data, the proposed method is effective on real-world images, and it significantly outperforms the other tested state-of-the-art techniques.
Tasks
Published 2018-01-31
URL http://arxiv.org/abs/1802.00094v1
PDF http://arxiv.org/pdf/1802.00094v1.pdf
PWC https://paperswithcode.com/paper/single-image-reflection-removal-using-deep
Repo https://github.com/LastReLU/Reflection-Separation
Framework pytorch

StarAlgo: A Squad Movement Planning Library for StarCraft using Monte Carlo Tree Search and Negamax

Title StarAlgo: A Squad Movement Planning Library for StarCraft using Monte Carlo Tree Search and Negamax
Authors Mykyta Viazovskyi, Michal Certicky
Abstract Real-Time Strategy (RTS) games have recently become a popular testbed for artificial intelligence research. They represent a complex adversarial domain providing a number of interesting AI challenges. There exists a wide variety of research-supporting software tools, libraries and frameworks for one RTS game in particular – StarCraft: Brood War. These tools are designed to address various specific sub-problems, such as resource allocation or opponent modelling so that researchers can focus exclusively on the tasks relevant to them. We present one such tool – a library called StarAlgo that produces plans for the coordinated movement of squads (groups of combat units) within the game world. StarAlgo library can solve the squad movement planning problem using one of two algorithms: Monte Carlo Tree Search Considering Durations (MCTSCD) and a slightly modified version of Negamax. We evaluate both the algorithms, compare them, and demonstrate their usage. The library is implemented as a static C++ library that can be easily plugged into most StarCraft AI bots.
Tasks Real-Time Strategy Games, Starcraft
Published 2018-12-29
URL http://arxiv.org/abs/1812.11371v1
PDF http://arxiv.org/pdf/1812.11371v1.pdf
PWC https://paperswithcode.com/paper/staralgo-a-squad-movement-planning-library
Repo https://github.com/Games-and-Simulations/StarAlgo
Framework none

How Does Batch Normalization Help Optimization?

Title How Does Batch Normalization Help Optimization?
Authors Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry
Abstract Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm’s effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers’ input distributions during training to reduce the so-called “internal covariate shift”. In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training.
Tasks
Published 2018-05-29
URL http://arxiv.org/abs/1805.11604v5
PDF http://arxiv.org/pdf/1805.11604v5.pdf
PWC https://paperswithcode.com/paper/how-does-batch-normalization-help
Repo https://github.com/peteraugustine/seg3
Framework none

Know What You Don’t Know: Unanswerable Questions for SQuAD

Title Know What You Don’t Know: Unanswerable Questions for SQuAD
Authors Pranav Rajpurkar, Robin Jia, Percy Liang
Abstract Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context. Existing datasets either focus exclusively on answerable questions, or use automatically generated unanswerable questions that are easy to identify. To address these weaknesses, we present SQuAD 2.0, the latest version of the Stanford Question Answering Dataset (SQuAD). SQuAD 2.0 combines existing SQuAD data with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD 2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. SQuAD 2.0 is a challenging natural language understanding task for existing models: a strong neural system that gets 86% F1 on SQuAD 1.1 achieves only 66% F1 on SQuAD 2.0.
Tasks Question Answering, Reading Comprehension
Published 2018-06-11
URL http://arxiv.org/abs/1806.03822v1
PDF http://arxiv.org/pdf/1806.03822v1.pdf
PWC https://paperswithcode.com/paper/know-what-you-dont-know-unanswerable
Repo https://github.com/leozhoujf/DataSciComp
Framework none

Learning Confidence for Out-of-Distribution Detection in Neural Networks

Title Learning Confidence for Out-of-Distribution Detection in Neural Networks
Authors Terrance DeVries, Graham W. Taylor
Abstract Modern neural networks are very powerful predictive models, but they are often incapable of recognizing when their predictions may be wrong. Closely related to this is the task of out-of-distribution detection, where a network must determine whether or not an input is outside of the set on which it is expected to safely perform. To jointly address these issues, we propose a method of learning confidence estimates for neural networks that is simple to implement and produces intuitively interpretable outputs. We demonstrate that on the task of out-of-distribution detection, our technique surpasses recently proposed techniques which construct confidence based on the network’s output distribution, without requiring any additional labels or access to out-of-distribution examples. Additionally, we address the problem of calibrating out-of-distribution detectors, where we demonstrate that misclassified in-distribution examples can be used as a proxy for out-of-distribution examples.
Tasks Out-of-Distribution Detection
Published 2018-02-13
URL http://arxiv.org/abs/1802.04865v1
PDF http://arxiv.org/pdf/1802.04865v1.pdf
PWC https://paperswithcode.com/paper/learning-confidence-for-out-of-distribution
Repo https://github.com/nathanin/ood
Framework tf

Uncertainty Estimations by Softplus normalization in Bayesian Convolutional Neural Networks with Variational Inference

Title Uncertainty Estimations by Softplus normalization in Bayesian Convolutional Neural Networks with Variational Inference
Authors Kumar Shridhar, Felix Laumann, Marcus Liwicki
Abstract We introduce a novel uncertainty estimation for classification tasks for Bayesian convolutional neural networks with variational inference. By normalizing the output of a Softplus function in the final layer, we estimate aleatoric and epistemic uncertainty in a coherent manner. The intractable posterior probability distributions over weights are inferred by Bayes by Backprop. Firstly, we demonstrate how this reliable variational inference method can serve as a fundamental construct for various network architectures. On multiple datasets in supervised learning settings (MNIST, CIFAR-10, CIFAR-100), this variational inference method achieves performances equivalent to frequentist inference in identical architectures, while the two desiderata, a measure for uncertainty and regularization are incorporated naturally. Secondly, we examine how our proposed measure for aleatoric and epistemic uncertainties is derived and validate it on the aforementioned datasets.
Tasks Bayesian Inference
Published 2018-06-15
URL https://arxiv.org/abs/1806.05978v6
PDF https://arxiv.org/pdf/1806.05978v6.pdf
PWC https://paperswithcode.com/paper/bayesian-convolutional-neural-networks-with-1
Repo https://github.com/kumar-shridhar/PyTorch-Softplus-Normalization-Uncertainty-Estimation-Bayesian-CNN
Framework pytorch

StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing

Title StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing
Authors Pengcheng Yin, Chunting Zhou, Junxian He, Graham Neubig
Abstract Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs), commonly represented as tree structures. Annotating NL utterances with their corresponding MRs is expensive and time-consuming, and thus the limited availability of labeled data often becomes the bottleneck of data-driven, supervised models. We introduce StructVAE, a variational auto-encoding model for semisupervised semantic parsing, which learns both from limited amounts of parallel data, and readily-available unlabeled NL utterances. StructVAE models latent MRs not observed in the unlabeled data as tree-structured latent variables. Experiments on semantic parsing on the ATIS domain and Python code generation show that with extra unlabeled data, StructVAE outperforms strong supervised models.
Tasks Code Generation, Latent Variable Models, Semantic Parsing
Published 2018-06-20
URL http://arxiv.org/abs/1806.07832v1
PDF http://arxiv.org/pdf/1806.07832v1.pdf
PWC https://paperswithcode.com/paper/structvae-tree-structured-latent-variable
Repo https://github.com/pcyin/tranX
Framework pytorch

SynSeg-Net: Synthetic Segmentation Without Target Modality Ground Truth

Title SynSeg-Net: Synthetic Segmentation Without Target Modality Ground Truth
Authors Yuankai Huo, Zhoubing Xu, Hyeonsoo Moon, Shunxing Bao, Albert Assad, Tamara K. Moyo, Michael R. Savona, Richard G. Abramson, Bennett A. Landman
Abstract A key limitation of deep convolutional neural networks (DCNN) based image segmentation methods is the lack of generalizability. Manually traced training images are typically required when segmenting organs in a new imaging modality or from distinct disease cohort. The manual efforts can be alleviated if the manually traced images in one imaging modality (e.g., MRI) are able to train a segmentation network for another imaging modality (e.g., CT). In this paper, we propose an end-to-end synthetic segmentation network (SynSeg-Net) to train a segmentation network for a target imaging modality without having manual labels. SynSeg-Net is trained by using (1) unpaired intensity images from source and target modalities, and (2) manual labels only from source modality. SynSeg-Net is enabled by the recent advances of cycle generative adversarial networks (CycleGAN) and DCNN. We evaluate the performance of the SynSeg-Net on two experiments: (1) MRI to CT splenomegaly synthetic segmentation for abdominal images, and (2) CT to MRI total intracranial volume synthetic segmentation (TICV) for brain images. The proposed end-to-end approach achieved superior performance to two stage methods. Moreover, the SynSeg-Net achieved comparable performance to the traditional segmentation network using target modality labels in certain scenarios. The source code of SynSeg-Net is publicly available (https://github.com/MASILab/SynSeg-Net).
Tasks Semantic Segmentation
Published 2018-10-15
URL https://arxiv.org/abs/1810.06498v2
PDF https://arxiv.org/pdf/1810.06498v2.pdf
PWC https://paperswithcode.com/paper/synseg-net-synthetic-segmentation-without
Repo https://github.com/MASILab/SynSeg-Net
Framework caffe2

Learning Disentangled Joint Continuous and Discrete Representations

Title Learning Disentangled Joint Continuous and Discrete Representations
Authors Emilien Dupont
Abstract We present a framework for learning disentangled and interpretable jointly continuous and discrete representations in an unsupervised manner. By augmenting the continuous latent distribution of variational autoencoders with a relaxed discrete distribution and controlling the amount of information encoded in each latent unit, we show how continuous and categorical factors of variation can be discovered automatically from data. Experiments show that the framework disentangles continuous and discrete generative factors on various datasets and outperforms current disentangling methods when a discrete generative factor is prominent.
Tasks
Published 2018-03-31
URL http://arxiv.org/abs/1804.00104v3
PDF http://arxiv.org/pdf/1804.00104v3.pdf
PWC https://paperswithcode.com/paper/learning-disentangled-joint-continuous-and
Repo https://github.com/Schlumberger/joint-vae
Framework pytorch

WaveGlow: A Flow-based Generative Network for Speech Synthesis

Title WaveGlow: A Flow-based Generative Network for Speech Synthesis
Authors Ryan Prenger, Rafael Valle, Bryan Catanzaro
Abstract In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable. Our PyTorch implementation produces audio samples at a rate of more than 500 kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio quality as good as the best publicly available WaveNet implementation. All code will be made publicly available online.
Tasks Speech Synthesis
Published 2018-10-31
URL http://arxiv.org/abs/1811.00002v1
PDF http://arxiv.org/pdf/1811.00002v1.pdf
PWC https://paperswithcode.com/paper/waveglow-a-flow-based-generative-network-for
Repo https://github.com/yanggeng1995/WaveGlow
Framework tf

Personalizing Similar Product Recommendations in Fashion E-commerce

Title Personalizing Similar Product Recommendations in Fashion E-commerce
Authors Pankaj Agarwal, Sreekanth Vempati, Sumit Borar
Abstract In fashion e-commerce platforms, product discovery is one of the key components of a good user experience. There are numerous ways using which people find the products they desire. Similar product recommendations is one of the popular modes using which users find products that resonate with their intent. Generally these recommendations are not personalized to a specific user. Traditionally, collaborative filtering based approaches have been popular in the literature for recommending non-personalized products given a query product. Also, there has been focus on personalizing the product listing for a given user. In this paper, we marry these approaches so that users will be recommended with personalized similar products. Our experimental results on a large fashion e-commerce platform (Myntra) show that we can improve the key metrics by applying personalization on similar product recommendations.
Tasks
Published 2018-06-29
URL https://arxiv.org/abs/1806.11371v1
PDF https://arxiv.org/pdf/1806.11371v1.pdf
PWC https://paperswithcode.com/paper/personalizing-similar-product-recommendations
Repo https://github.com/manohar029/Ecommerce-Implicit-Data-Recommender-System
Framework none
comments powered by Disqus