October 21, 2019

2812 words 14 mins read

Paper Group AWR 40

The NES Music Database: A multi-instrumental dataset with expressive performance attributes. SING: Symbol-to-Instrument Neural Generator. Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network. Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3. Single Image Reflection Removal Using …

The NES Music Database: A multi-instrumental dataset with expressive performance attributes


Title	The NES Music Database: A multi-instrumental dataset with expressive performance attributes
Authors	Chris Donahue, Huanru Henry Mao, Julian McAuley
Abstract	Existing research on music generation focuses on composition, but often ignores the expressive performance characteristics required for plausible renditions of resultant pieces. In this paper, we introduce the Nintendo Entertainment System Music Database (NES-MDB), a large corpus allowing for separate examination of the tasks of composition and performance. NES-MDB contains thousands of multi-instrumental songs composed for playback by the compositionally-constrained NES audio synthesizer. For each song, the dataset contains a musical score for four instrument voices as well as expressive attributes for the dynamics and timbre of each voice. Unlike datasets comprised of General MIDI files, NES-MDB includes all of the information needed to render exact acoustic performances of the original compositions. Alongside the dataset, we provide a tool that renders generated compositions as NES-style audio by emulating the device’s audio processor. Additionally, we establish baselines for the tasks of composition, which consists of learning the semantics of composing for the NES synthesizer, and performance, which involves finding a mapping between a composition and realistic expressive attributes.
Tasks	Music Generation
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04278v1
PDF	http://arxiv.org/pdf/1806.04278v1.pdf
PWC	https://paperswithcode.com/paper/the-nes-music-database-a-multi-instrumental
Repo	https://github.com/chrisdonahue/nesmdb
Framework	none

SING: Symbol-to-Instrument Neural Generator


Title	SING: Symbol-to-Instrument Neural Generator
Authors	Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach
Abstract	Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN suffer from prohibitive training and inference times because they are based on autoregressive models that generate audio samples one at a time at a rate of 16kHz. In this work, we study the more computationally efficient alternative of generating the waveform frame-by-frame with large strides. We present SING, a lightweight neural audio synthesizer for the original task of generating musical notes given desired instrument, pitch and velocity. Our model is trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms. On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.
Tasks	Music Generation
Published	2018-10-23
URL	http://arxiv.org/abs/1810.09785v1
PDF	http://arxiv.org/pdf/1810.09785v1.pdf
PWC	https://paperswithcode.com/paper/sing-symbol-to-instrument-neural-generator
Repo	https://github.com/facebookresearch/SING
Framework	pytorch

Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network


Title	Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network
Authors	Hao-Min Liu, Yi-Hsuan Yang
Abstract	Research on automatic music generation has seen great progress due to the development of deep neural networks. However, the generation of multi-instrument music of arbitrary genres still remains a challenge. Existing research either works on lead sheets or multi-track piano-rolls found in MIDIs, but both musical notations have their limits. In this work, we propose a new task called lead sheet arrangement to avoid such limits. A new recurrent convolutional generative model for the task is proposed, along with three new symbolic-domain harmonic features to facilitate learning from unpaired lead sheets and MIDIs. Our model can generate lead sheets and their arrangements of eight-bar long. Audio samples of the generated result can be found at https://drive.google.com/open?id=1c0FfODTpudmLvuKBbc23VBCgQizY6-Rk
Tasks	Music Generation
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11161v1
PDF	http://arxiv.org/pdf/1807.11161v1.pdf
PWC	https://paperswithcode.com/paper/lead-sheet-generation-and-arrangement-by
Repo	https://github.com/liuhaumin/leadsheetgan
Framework	tf

Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3


Title	Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3
Authors	Bilel Benjdira, Taha Khursheed, Anis Koubaa, Adel Ammar, Kais Ouni
Abstract	Unmanned Aerial Vehicles are increasingly being used in surveillance and traffic monitoring thanks to their high mobility and ability to cover areas at different altitudes and locations. One of the major challenges is to use aerial images to accurately detect cars and count them in real-time for traffic monitoring purposes. Several deep learning techniques were recently proposed based on convolution neural network (CNN) for real-time classification and recognition in computer vision. However, their performance depends on the scenarios where they are used. In this paper, we investigate the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN and YOLOv3, in the context of car detection from aerial images. We trained and tested these two models on a large car dataset taken from UAVs. We demonstrated in this paper that YOLOv3 outperforms Faster R-CNN in sensitivity and processing time, although they are comparable in the precision metric.
Tasks
Published	2018-12-28
URL	http://arxiv.org/abs/1812.10968v1
PDF	http://arxiv.org/pdf/1812.10968v1.pdf
PWC	https://paperswithcode.com/paper/car-detection-using-unmanned-aerial-vehicles
Repo	https://github.com/aniskoubaa/car_detection_yolo_faster_rcnn_uvsc2019
Framework	tf

Single Image Reflection Removal Using Deep Encoder-Decoder Network


Title	Single Image Reflection Removal Using Deep Encoder-Decoder Network
Authors	Zhixiang Chi, Xiaolin Wu, Xiao Shu, Jinjin Gu
Abstract	Image of a scene captured through a piece of transparent and reflective material, such as glass, is often spoiled by a superimposed layer of reflection image. While separating the reflection from a familiar object in an image is mentally not difficult for humans, it is a challenging, ill-posed problem in computer vision. In this paper, we propose a novel deep convolutional encoder-decoder method to remove the objectionable reflection by learning a map between image pairs with and without reflection. For training the neural network, we model the physical formation of reflections in images and synthesize a large number of photo-realistic reflection-tainted images from reflection-free images collected online. Extensive experimental results show that, although the neural network learns only from synthetic data, the proposed method is effective on real-world images, and it significantly outperforms the other tested state-of-the-art techniques.
Tasks
Published	2018-01-31
URL	http://arxiv.org/abs/1802.00094v1
PDF	http://arxiv.org/pdf/1802.00094v1.pdf
PWC	https://paperswithcode.com/paper/single-image-reflection-removal-using-deep
Repo	https://github.com/LastReLU/Reflection-Separation
Framework	pytorch

StarAlgo: A Squad Movement Planning Library for StarCraft using Monte Carlo Tree Search and Negamax


Title	StarAlgo: A Squad Movement Planning Library for StarCraft using Monte Carlo Tree Search and Negamax
Authors	Mykyta Viazovskyi, Michal Certicky
Abstract	Real-Time Strategy (RTS) games have recently become a popular testbed for artificial intelligence research. They represent a complex adversarial domain providing a number of interesting AI challenges. There exists a wide variety of research-supporting software tools, libraries and frameworks for one RTS game in particular – StarCraft: Brood War. These tools are designed to address various specific sub-problems, such as resource allocation or opponent modelling so that researchers can focus exclusively on the tasks relevant to them. We present one such tool – a library called StarAlgo that produces plans for the coordinated movement of squads (groups of combat units) within the game world. StarAlgo library can solve the squad movement planning problem using one of two algorithms: Monte Carlo Tree Search Considering Durations (MCTSCD) and a slightly modified version of Negamax. We evaluate both the algorithms, compare them, and demonstrate their usage. The library is implemented as a static C++ library that can be easily plugged into most StarCraft AI bots.
Tasks	Real-Time Strategy Games, Starcraft
Published	2018-12-29
URL	http://arxiv.org/abs/1812.11371v1
PDF	http://arxiv.org/pdf/1812.11371v1.pdf
PWC	https://paperswithcode.com/paper/staralgo-a-squad-movement-planning-library
Repo	https://github.com/Games-and-Simulations/StarAlgo
Framework	none

How Does Batch Normalization Help Optimization?


Title	How Does Batch Normalization Help Optimization?
Authors	Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry
Abstract	Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm’s effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers’ input distributions during training to reduce the so-called “internal covariate shift”. In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training.
Tasks
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11604v5
PDF	http://arxiv.org/pdf/1805.11604v5.pdf
PWC	https://paperswithcode.com/paper/how-does-batch-normalization-help
Repo	https://github.com/peteraugustine/seg3
Framework	none

Know What You Don’t Know: Unanswerable Questions for SQuAD


Title	Know What You Don’t Know: Unanswerable Questions for SQuAD
Authors	Pranav Rajpurkar, Robin Jia, Percy Liang
Abstract	Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context. Existing datasets either focus exclusively on answerable questions, or use automatically generated unanswerable questions that are easy to identify. To address these weaknesses, we present SQuAD 2.0, the latest version of the Stanford Question Answering Dataset (SQuAD). SQuAD 2.0 combines existing SQuAD data with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD 2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. SQuAD 2.0 is a challenging natural language understanding task for existing models: a strong neural system that gets 86% F1 on SQuAD 1.1 achieves only 66% F1 on SQuAD 2.0.
Tasks	Question Answering, Reading Comprehension
Published	2018-06-11
URL	http://arxiv.org/abs/1806.03822v1
PDF	http://arxiv.org/pdf/1806.03822v1.pdf
PWC	https://paperswithcode.com/paper/know-what-you-dont-know-unanswerable
Repo	https://github.com/leozhoujf/DataSciComp
Framework	none

Learning Confidence for Out-of-Distribution Detection in Neural Networks


Title	Learning Confidence for Out-of-Distribution Detection in Neural Networks
Authors	Terrance DeVries, Graham W. Taylor
Abstract	Modern neural networks are very powerful predictive models, but they are often incapable of recognizing when their predictions may be wrong. Closely related to this is the task of out-of-distribution detection, where a network must determine whether or not an input is outside of the set on which it is expected to safely perform. To jointly address these issues, we propose a method of learning confidence estimates for neural networks that is simple to implement and produces intuitively interpretable outputs. We demonstrate that on the task of out-of-distribution detection, our technique surpasses recently proposed techniques which construct confidence based on the network’s output distribution, without requiring any additional labels or access to out-of-distribution examples. Additionally, we address the problem of calibrating out-of-distribution detectors, where we demonstrate that misclassified in-distribution examples can be used as a proxy for out-of-distribution examples.
Tasks	Out-of-Distribution Detection
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04865v1
PDF	http://arxiv.org/pdf/1802.04865v1.pdf
PWC	https://paperswithcode.com/paper/learning-confidence-for-out-of-distribution
Repo	https://github.com/nathanin/ood
Framework	tf

Uncertainty Estimations by Softplus normalization in Bayesian Convolutional Neural Networks with Variational Inference


Title	Uncertainty Estimations by Softplus normalization in Bayesian Convolutional Neural Networks with Variational Inference
Authors	Kumar Shridhar, Felix Laumann, Marcus Liwicki
Abstract	We introduce a novel uncertainty estimation for classification tasks for Bayesian convolutional neural networks with variational inference. By normalizing the output of a Softplus function in the final layer, we estimate aleatoric and epistemic uncertainty in a coherent manner. The intractable posterior probability distributions over weights are inferred by Bayes by Backprop. Firstly, we demonstrate how this reliable variational inference method can serve as a fundamental construct for various network architectures. On multiple datasets in supervised learning settings (MNIST, CIFAR-10, CIFAR-100), this variational inference method achieves performances equivalent to frequentist inference in identical architectures, while the two desiderata, a measure for uncertainty and regularization are incorporated naturally. Secondly, we examine how our proposed measure for aleatoric and epistemic uncertainties is derived and validate it on the aforementioned datasets.
Tasks	Bayesian Inference
Published	2018-06-15
URL	https://arxiv.org/abs/1806.05978v6
PDF	https://arxiv.org/pdf/1806.05978v6.pdf
PWC	https://paperswithcode.com/paper/bayesian-convolutional-neural-networks-with-1
Repo	https://github.com/kumar-shridhar/PyTorch-Softplus-Normalization-Uncertainty-Estimation-Bayesian-CNN
Framework	pytorch

StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing


Title	StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing
Authors	Pengcheng Yin, Chunting Zhou, Junxian He, Graham Neubig
Abstract	Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs), commonly represented as tree structures. Annotating NL utterances with their corresponding MRs is expensive and time-consuming, and thus the limited availability of labeled data often becomes the bottleneck of data-driven, supervised models. We introduce StructVAE, a variational auto-encoding model for semisupervised semantic parsing, which learns both from limited amounts of parallel data, and readily-available unlabeled NL utterances. StructVAE models latent MRs not observed in the unlabeled data as tree-structured latent variables. Experiments on semantic parsing on the ATIS domain and Python code generation show that with extra unlabeled data, StructVAE outperforms strong supervised models.
Tasks	Code Generation, Latent Variable Models, Semantic Parsing
Published	2018-06-20
URL	http://arxiv.org/abs/1806.07832v1
PDF	http://arxiv.org/pdf/1806.07832v1.pdf
PWC	https://paperswithcode.com/paper/structvae-tree-structured-latent-variable
Repo	https://github.com/pcyin/tranX
Framework	pytorch

SynSeg-Net: Synthetic Segmentation Without Target Modality Ground Truth


Title	SynSeg-Net: Synthetic Segmentation Without Target Modality Ground Truth
Authors	Yuankai Huo, Zhoubing Xu, Hyeonsoo Moon, Shunxing Bao, Albert Assad, Tamara K. Moyo, Michael R. Savona, Richard G. Abramson, Bennett A. Landman
Abstract	A key limitation of deep convolutional neural networks (DCNN) based image segmentation methods is the lack of generalizability. Manually traced training images are typically required when segmenting organs in a new imaging modality or from distinct disease cohort. The manual efforts can be alleviated if the manually traced images in one imaging modality (e.g., MRI) are able to train a segmentation network for another imaging modality (e.g., CT). In this paper, we propose an end-to-end synthetic segmentation network (SynSeg-Net) to train a segmentation network for a target imaging modality without having manual labels. SynSeg-Net is trained by using (1) unpaired intensity images from source and target modalities, and (2) manual labels only from source modality. SynSeg-Net is enabled by the recent advances of cycle generative adversarial networks (CycleGAN) and DCNN. We evaluate the performance of the SynSeg-Net on two experiments: (1) MRI to CT splenomegaly synthetic segmentation for abdominal images, and (2) CT to MRI total intracranial volume synthetic segmentation (TICV) for brain images. The proposed end-to-end approach achieved superior performance to two stage methods. Moreover, the SynSeg-Net achieved comparable performance to the traditional segmentation network using target modality labels in certain scenarios. The source code of SynSeg-Net is publicly available (https://github.com/MASILab/SynSeg-Net).
Tasks	Semantic Segmentation
Published	2018-10-15
URL	https://arxiv.org/abs/1810.06498v2
PDF	https://arxiv.org/pdf/1810.06498v2.pdf
PWC	https://paperswithcode.com/paper/synseg-net-synthetic-segmentation-without
Repo	https://github.com/MASILab/SynSeg-Net
Framework	caffe2

Learning Disentangled Joint Continuous and Discrete Representations


Title	Learning Disentangled Joint Continuous and Discrete Representations
Authors	Emilien Dupont
Abstract	We present a framework for learning disentangled and interpretable jointly continuous and discrete representations in an unsupervised manner. By augmenting the continuous latent distribution of variational autoencoders with a relaxed discrete distribution and controlling the amount of information encoded in each latent unit, we show how continuous and categorical factors of variation can be discovered automatically from data. Experiments show that the framework disentangles continuous and discrete generative factors on various datasets and outperforms current disentangling methods when a discrete generative factor is prominent.
Tasks
Published	2018-03-31
URL	http://arxiv.org/abs/1804.00104v3
PDF	http://arxiv.org/pdf/1804.00104v3.pdf
PWC	https://paperswithcode.com/paper/learning-disentangled-joint-continuous-and
Repo	https://github.com/Schlumberger/joint-vae
Framework	pytorch

WaveGlow: A Flow-based Generative Network for Speech Synthesis


Title	WaveGlow: A Flow-based Generative Network for Speech Synthesis
Authors	Ryan Prenger, Rafael Valle, Bryan Catanzaro
Abstract	In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable. Our PyTorch implementation produces audio samples at a rate of more than 500 kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio quality as good as the best publicly available WaveNet implementation. All code will be made publicly available online.
Tasks	Speech Synthesis
Published	2018-10-31
URL	http://arxiv.org/abs/1811.00002v1
PDF	http://arxiv.org/pdf/1811.00002v1.pdf
PWC	https://paperswithcode.com/paper/waveglow-a-flow-based-generative-network-for
Repo	https://github.com/yanggeng1995/WaveGlow
Framework	tf

Personalizing Similar Product Recommendations in Fashion E-commerce


Title	Personalizing Similar Product Recommendations in Fashion E-commerce
Authors	Pankaj Agarwal, Sreekanth Vempati, Sumit Borar
Abstract	In fashion e-commerce platforms, product discovery is one of the key components of a good user experience. There are numerous ways using which people find the products they desire. Similar product recommendations is one of the popular modes using which users find products that resonate with their intent. Generally these recommendations are not personalized to a specific user. Traditionally, collaborative filtering based approaches have been popular in the literature for recommending non-personalized products given a query product. Also, there has been focus on personalizing the product listing for a given user. In this paper, we marry these approaches so that users will be recommended with personalized similar products. Our experimental results on a large fashion e-commerce platform (Myntra) show that we can improve the key metrics by applying personalization on similar product recommendations.
Tasks
Published	2018-06-29
URL	https://arxiv.org/abs/1806.11371v1
PDF	https://arxiv.org/pdf/1806.11371v1.pdf
PWC	https://paperswithcode.com/paper/personalizing-similar-product-recommendations
Repo	https://github.com/manohar029/Ecommerce-Implicit-Data-Recommender-System
Framework	none