Paper Group AWR 40
The NES Music Database: A multi-instrumental dataset with expressive performance attributes. SING: Symbol-to-Instrument Neural Generator. Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network. Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3. Single Image Reflection Removal Using …
The NES Music Database: A multi-instrumental dataset with expressive performance attributes
Title | The NES Music Database: A multi-instrumental dataset with expressive performance attributes |
Authors | Chris Donahue, Huanru Henry Mao, Julian McAuley |
Abstract | Existing research on music generation focuses on composition, but often ignores the expressive performance characteristics required for plausible renditions of resultant pieces. In this paper, we introduce the Nintendo Entertainment System Music Database (NES-MDB), a large corpus allowing for separate examination of the tasks of composition and performance. NES-MDB contains thousands of multi-instrumental songs composed for playback by the compositionally-constrained NES audio synthesizer. For each song, the dataset contains a musical score for four instrument voices as well as expressive attributes for the dynamics and timbre of each voice. Unlike datasets comprised of General MIDI files, NES-MDB includes all of the information needed to render exact acoustic performances of the original compositions. Alongside the dataset, we provide a tool that renders generated compositions as NES-style audio by emulating the device’s audio processor. Additionally, we establish baselines for the tasks of composition, which consists of learning the semantics of composing for the NES synthesizer, and performance, which involves finding a mapping between a composition and realistic expressive attributes. |
Tasks | Music Generation |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04278v1 |
http://arxiv.org/pdf/1806.04278v1.pdf | |
PWC | https://paperswithcode.com/paper/the-nes-music-database-a-multi-instrumental |
Repo | https://github.com/chrisdonahue/nesmdb |
Framework | none |
SING: Symbol-to-Instrument Neural Generator
Title | SING: Symbol-to-Instrument Neural Generator |
Authors | Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach |
Abstract | Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN suffer from prohibitive training and inference times because they are based on autoregressive models that generate audio samples one at a time at a rate of 16kHz. In this work, we study the more computationally efficient alternative of generating the waveform frame-by-frame with large strides. We present SING, a lightweight neural audio synthesizer for the original task of generating musical notes given desired instrument, pitch and velocity. Our model is trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms. On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference. |
Tasks | Music Generation |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.09785v1 |
http://arxiv.org/pdf/1810.09785v1.pdf | |
PWC | https://paperswithcode.com/paper/sing-symbol-to-instrument-neural-generator |
Repo | https://github.com/facebookresearch/SING |
Framework | pytorch |
Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network
Title | Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network |
Authors | Hao-Min Liu, Yi-Hsuan Yang |
Abstract | Research on automatic music generation has seen great progress due to the development of deep neural networks. However, the generation of multi-instrument music of arbitrary genres still remains a challenge. Existing research either works on lead sheets or multi-track piano-rolls found in MIDIs, but both musical notations have their limits. In this work, we propose a new task called lead sheet arrangement to avoid such limits. A new recurrent convolutional generative model for the task is proposed, along with three new symbolic-domain harmonic features to facilitate learning from unpaired lead sheets and MIDIs. Our model can generate lead sheets and their arrangements of eight-bar long. Audio samples of the generated result can be found at https://drive.google.com/open?id=1c0FfODTpudmLvuKBbc23VBCgQizY6-Rk |
Tasks | Music Generation |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11161v1 |
http://arxiv.org/pdf/1807.11161v1.pdf | |
PWC | https://paperswithcode.com/paper/lead-sheet-generation-and-arrangement-by |
Repo | https://github.com/liuhaumin/leadsheetgan |
Framework | tf |
Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3
Title | Car Detection using Unmanned Aerial Vehicles: Comparison between Faster R-CNN and YOLOv3 |
Authors | Bilel Benjdira, Taha Khursheed, Anis Koubaa, Adel Ammar, Kais Ouni |
Abstract | Unmanned Aerial Vehicles are increasingly being used in surveillance and traffic monitoring thanks to their high mobility and ability to cover areas at different altitudes and locations. One of the major challenges is to use aerial images to accurately detect cars and count them in real-time for traffic monitoring purposes. Several deep learning techniques were recently proposed based on convolution neural network (CNN) for real-time classification and recognition in computer vision. However, their performance depends on the scenarios where they are used. In this paper, we investigate the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN and YOLOv3, in the context of car detection from aerial images. We trained and tested these two models on a large car dataset taken from UAVs. We demonstrated in this paper that YOLOv3 outperforms Faster R-CNN in sensitivity and processing time, although they are comparable in the precision metric. |
Tasks | |
Published | 2018-12-28 |
URL | http://arxiv.org/abs/1812.10968v1 |
http://arxiv.org/pdf/1812.10968v1.pdf | |
PWC | https://paperswithcode.com/paper/car-detection-using-unmanned-aerial-vehicles |
Repo | https://github.com/aniskoubaa/car_detection_yolo_faster_rcnn_uvsc2019 |
Framework | tf |
Single Image Reflection Removal Using Deep Encoder-Decoder Network
Title | Single Image Reflection Removal Using Deep Encoder-Decoder Network |
Authors | Zhixiang Chi, Xiaolin Wu, Xiao Shu, Jinjin Gu |
Abstract | Image of a scene captured through a piece of transparent and reflective material, such as glass, is often spoiled by a superimposed layer of reflection image. While separating the reflection from a familiar object in an image is mentally not difficult for humans, it is a challenging, ill-posed problem in computer vision. In this paper, we propose a novel deep convolutional encoder-decoder method to remove the objectionable reflection by learning a map between image pairs with and without reflection. For training the neural network, we model the physical formation of reflections in images and synthesize a large number of photo-realistic reflection-tainted images from reflection-free images collected online. Extensive experimental results show that, although the neural network learns only from synthetic data, the proposed method is effective on real-world images, and it significantly outperforms the other tested state-of-the-art techniques. |
Tasks | |
Published | 2018-01-31 |
URL | http://arxiv.org/abs/1802.00094v1 |
http://arxiv.org/pdf/1802.00094v1.pdf | |
PWC | https://paperswithcode.com/paper/single-image-reflection-removal-using-deep |
Repo | https://github.com/LastReLU/Reflection-Separation |
Framework | pytorch |
StarAlgo: A Squad Movement Planning Library for StarCraft using Monte Carlo Tree Search and Negamax
Title | StarAlgo: A Squad Movement Planning Library for StarCraft using Monte Carlo Tree Search and Negamax |
Authors | Mykyta Viazovskyi, Michal Certicky |
Abstract | Real-Time Strategy (RTS) games have recently become a popular testbed for artificial intelligence research. They represent a complex adversarial domain providing a number of interesting AI challenges. There exists a wide variety of research-supporting software tools, libraries and frameworks for one RTS game in particular – StarCraft: Brood War. These tools are designed to address various specific sub-problems, such as resource allocation or opponent modelling so that researchers can focus exclusively on the tasks relevant to them. We present one such tool – a library called StarAlgo that produces plans for the coordinated movement of squads (groups of combat units) within the game world. StarAlgo library can solve the squad movement planning problem using one of two algorithms: Monte Carlo Tree Search Considering Durations (MCTSCD) and a slightly modified version of Negamax. We evaluate both the algorithms, compare them, and demonstrate their usage. The library is implemented as a static C++ library that can be easily plugged into most StarCraft AI bots. |
Tasks | Real-Time Strategy Games, Starcraft |
Published | 2018-12-29 |
URL | http://arxiv.org/abs/1812.11371v1 |
http://arxiv.org/pdf/1812.11371v1.pdf | |
PWC | https://paperswithcode.com/paper/staralgo-a-squad-movement-planning-library |
Repo | https://github.com/Games-and-Simulations/StarAlgo |
Framework | none |
How Does Batch Normalization Help Optimization?
Title | How Does Batch Normalization Help Optimization? |
Authors | Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry |
Abstract | Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm’s effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers’ input distributions during training to reduce the so-called “internal covariate shift”. In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training. |
Tasks | |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11604v5 |
http://arxiv.org/pdf/1805.11604v5.pdf | |
PWC | https://paperswithcode.com/paper/how-does-batch-normalization-help |
Repo | https://github.com/peteraugustine/seg3 |
Framework | none |
Know What You Don’t Know: Unanswerable Questions for SQuAD
Title | Know What You Don’t Know: Unanswerable Questions for SQuAD |
Authors | Pranav Rajpurkar, Robin Jia, Percy Liang |
Abstract | Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context. Existing datasets either focus exclusively on answerable questions, or use automatically generated unanswerable questions that are easy to identify. To address these weaknesses, we present SQuAD 2.0, the latest version of the Stanford Question Answering Dataset (SQuAD). SQuAD 2.0 combines existing SQuAD data with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD 2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. SQuAD 2.0 is a challenging natural language understanding task for existing models: a strong neural system that gets 86% F1 on SQuAD 1.1 achieves only 66% F1 on SQuAD 2.0. |
Tasks | Question Answering, Reading Comprehension |
Published | 2018-06-11 |
URL | http://arxiv.org/abs/1806.03822v1 |
http://arxiv.org/pdf/1806.03822v1.pdf | |
PWC | https://paperswithcode.com/paper/know-what-you-dont-know-unanswerable |
Repo | https://github.com/leozhoujf/DataSciComp |
Framework | none |
Learning Confidence for Out-of-Distribution Detection in Neural Networks
Title | Learning Confidence for Out-of-Distribution Detection in Neural Networks |
Authors | Terrance DeVries, Graham W. Taylor |
Abstract | Modern neural networks are very powerful predictive models, but they are often incapable of recognizing when their predictions may be wrong. Closely related to this is the task of out-of-distribution detection, where a network must determine whether or not an input is outside of the set on which it is expected to safely perform. To jointly address these issues, we propose a method of learning confidence estimates for neural networks that is simple to implement and produces intuitively interpretable outputs. We demonstrate that on the task of out-of-distribution detection, our technique surpasses recently proposed techniques which construct confidence based on the network’s output distribution, without requiring any additional labels or access to out-of-distribution examples. Additionally, we address the problem of calibrating out-of-distribution detectors, where we demonstrate that misclassified in-distribution examples can be used as a proxy for out-of-distribution examples. |
Tasks | Out-of-Distribution Detection |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04865v1 |
http://arxiv.org/pdf/1802.04865v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-confidence-for-out-of-distribution |
Repo | https://github.com/nathanin/ood |
Framework | tf |
Uncertainty Estimations by Softplus normalization in Bayesian Convolutional Neural Networks with Variational Inference
Title | Uncertainty Estimations by Softplus normalization in Bayesian Convolutional Neural Networks with Variational Inference |
Authors | Kumar Shridhar, Felix Laumann, Marcus Liwicki |
Abstract | We introduce a novel uncertainty estimation for classification tasks for Bayesian convolutional neural networks with variational inference. By normalizing the output of a Softplus function in the final layer, we estimate aleatoric and epistemic uncertainty in a coherent manner. The intractable posterior probability distributions over weights are inferred by Bayes by Backprop. Firstly, we demonstrate how this reliable variational inference method can serve as a fundamental construct for various network architectures. On multiple datasets in supervised learning settings (MNIST, CIFAR-10, CIFAR-100), this variational inference method achieves performances equivalent to frequentist inference in identical architectures, while the two desiderata, a measure for uncertainty and regularization are incorporated naturally. Secondly, we examine how our proposed measure for aleatoric and epistemic uncertainties is derived and validate it on the aforementioned datasets. |
Tasks | Bayesian Inference |
Published | 2018-06-15 |
URL | https://arxiv.org/abs/1806.05978v6 |
https://arxiv.org/pdf/1806.05978v6.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-convolutional-neural-networks-with-1 |
Repo | https://github.com/kumar-shridhar/PyTorch-Softplus-Normalization-Uncertainty-Estimation-Bayesian-CNN |
Framework | pytorch |
StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing
Title | StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing |
Authors | Pengcheng Yin, Chunting Zhou, Junxian He, Graham Neubig |
Abstract | Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs), commonly represented as tree structures. Annotating NL utterances with their corresponding MRs is expensive and time-consuming, and thus the limited availability of labeled data often becomes the bottleneck of data-driven, supervised models. We introduce StructVAE, a variational auto-encoding model for semisupervised semantic parsing, which learns both from limited amounts of parallel data, and readily-available unlabeled NL utterances. StructVAE models latent MRs not observed in the unlabeled data as tree-structured latent variables. Experiments on semantic parsing on the ATIS domain and Python code generation show that with extra unlabeled data, StructVAE outperforms strong supervised models. |
Tasks | Code Generation, Latent Variable Models, Semantic Parsing |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07832v1 |
http://arxiv.org/pdf/1806.07832v1.pdf | |
PWC | https://paperswithcode.com/paper/structvae-tree-structured-latent-variable |
Repo | https://github.com/pcyin/tranX |
Framework | pytorch |
SynSeg-Net: Synthetic Segmentation Without Target Modality Ground Truth
Title | SynSeg-Net: Synthetic Segmentation Without Target Modality Ground Truth |
Authors | Yuankai Huo, Zhoubing Xu, Hyeonsoo Moon, Shunxing Bao, Albert Assad, Tamara K. Moyo, Michael R. Savona, Richard G. Abramson, Bennett A. Landman |
Abstract | A key limitation of deep convolutional neural networks (DCNN) based image segmentation methods is the lack of generalizability. Manually traced training images are typically required when segmenting organs in a new imaging modality or from distinct disease cohort. The manual efforts can be alleviated if the manually traced images in one imaging modality (e.g., MRI) are able to train a segmentation network for another imaging modality (e.g., CT). In this paper, we propose an end-to-end synthetic segmentation network (SynSeg-Net) to train a segmentation network for a target imaging modality without having manual labels. SynSeg-Net is trained by using (1) unpaired intensity images from source and target modalities, and (2) manual labels only from source modality. SynSeg-Net is enabled by the recent advances of cycle generative adversarial networks (CycleGAN) and DCNN. We evaluate the performance of the SynSeg-Net on two experiments: (1) MRI to CT splenomegaly synthetic segmentation for abdominal images, and (2) CT to MRI total intracranial volume synthetic segmentation (TICV) for brain images. The proposed end-to-end approach achieved superior performance to two stage methods. Moreover, the SynSeg-Net achieved comparable performance to the traditional segmentation network using target modality labels in certain scenarios. The source code of SynSeg-Net is publicly available (https://github.com/MASILab/SynSeg-Net). |
Tasks | Semantic Segmentation |
Published | 2018-10-15 |
URL | https://arxiv.org/abs/1810.06498v2 |
https://arxiv.org/pdf/1810.06498v2.pdf | |
PWC | https://paperswithcode.com/paper/synseg-net-synthetic-segmentation-without |
Repo | https://github.com/MASILab/SynSeg-Net |
Framework | caffe2 |
Learning Disentangled Joint Continuous and Discrete Representations
Title | Learning Disentangled Joint Continuous and Discrete Representations |
Authors | Emilien Dupont |
Abstract | We present a framework for learning disentangled and interpretable jointly continuous and discrete representations in an unsupervised manner. By augmenting the continuous latent distribution of variational autoencoders with a relaxed discrete distribution and controlling the amount of information encoded in each latent unit, we show how continuous and categorical factors of variation can be discovered automatically from data. Experiments show that the framework disentangles continuous and discrete generative factors on various datasets and outperforms current disentangling methods when a discrete generative factor is prominent. |
Tasks | |
Published | 2018-03-31 |
URL | http://arxiv.org/abs/1804.00104v3 |
http://arxiv.org/pdf/1804.00104v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-disentangled-joint-continuous-and |
Repo | https://github.com/Schlumberger/joint-vae |
Framework | pytorch |
WaveGlow: A Flow-based Generative Network for Speech Synthesis
Title | WaveGlow: A Flow-based Generative Network for Speech Synthesis |
Authors | Ryan Prenger, Rafael Valle, Bryan Catanzaro |
Abstract | In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable. Our PyTorch implementation produces audio samples at a rate of more than 500 kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio quality as good as the best publicly available WaveNet implementation. All code will be made publicly available online. |
Tasks | Speech Synthesis |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1811.00002v1 |
http://arxiv.org/pdf/1811.00002v1.pdf | |
PWC | https://paperswithcode.com/paper/waveglow-a-flow-based-generative-network-for |
Repo | https://github.com/yanggeng1995/WaveGlow |
Framework | tf |
Personalizing Similar Product Recommendations in Fashion E-commerce
Title | Personalizing Similar Product Recommendations in Fashion E-commerce |
Authors | Pankaj Agarwal, Sreekanth Vempati, Sumit Borar |
Abstract | In fashion e-commerce platforms, product discovery is one of the key components of a good user experience. There are numerous ways using which people find the products they desire. Similar product recommendations is one of the popular modes using which users find products that resonate with their intent. Generally these recommendations are not personalized to a specific user. Traditionally, collaborative filtering based approaches have been popular in the literature for recommending non-personalized products given a query product. Also, there has been focus on personalizing the product listing for a given user. In this paper, we marry these approaches so that users will be recommended with personalized similar products. Our experimental results on a large fashion e-commerce platform (Myntra) show that we can improve the key metrics by applying personalization on similar product recommendations. |
Tasks | |
Published | 2018-06-29 |
URL | https://arxiv.org/abs/1806.11371v1 |
https://arxiv.org/pdf/1806.11371v1.pdf | |
PWC | https://paperswithcode.com/paper/personalizing-similar-product-recommendations |
Repo | https://github.com/manohar029/Ecommerce-Implicit-Data-Recommender-System |
Framework | none |