Paper Group AWR 239
Super-resolution of Sentinel-2 images: Learning a globally applicable deep neural network. Photo-Realistic Blocksworld Dataset. Where are the Blobs: Counting by Localization with Point Supervision. Improving Object Counting with Heatmap Regulation. Perceptual deep depth super-resolution. DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinf …
Super-resolution of Sentinel-2 images: Learning a globally applicable deep neural network
Title | Super-resolution of Sentinel-2 images: Learning a globally applicable deep neural network |
Authors | Charis Lanaras, José Bioucas-Dias, Silvano Galliani, Emmanuel Baltsavias, Konrad Schindler |
Abstract | The Sentinel-2 satellite mission delivers multi-spectral imagery with 13 spectral bands, acquired at three different spatial resolutions. The aim of this research is to super-resolve the lower-resolution (20 m and 60 m Ground Sampling Distance - GSD) bands to 10 m GSD, so as to obtain a complete data cube at the maximal sensor resolution. We employ a state-of-the-art convolutional neural network (CNN) to perform end-to-end upsampling, which is trained with data at lower resolution, i.e., from 40->20 m, respectively 360->60 m GSD. In this way, one has access to a virtually infinite amount of training data, by downsampling real Sentinel-2 images. We use data sampled globally over a wide range of geographical locations, to obtain a network that generalises across different climate zones and land-cover types, and can super-resolve arbitrary Sentinel-2 images without the need of retraining. In quantitative evaluations (at lower scale, where ground truth is available), our network, which we call DSen2, outperforms the best competing approach by almost 50% in RMSE, while better preserving the spectral characteristics. It also delivers visually convincing results at the full 10 m GSD. The code is available at https://github.com/lanha/DSen2 |
Tasks | Super-Resolution |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04271v2 |
http://arxiv.org/pdf/1803.04271v2.pdf | |
PWC | https://paperswithcode.com/paper/super-resolution-of-sentinel-2-images |
Repo | https://github.com/deephdc/satsr |
Framework | tf |
Photo-Realistic Blocksworld Dataset
Title | Photo-Realistic Blocksworld Dataset |
Authors | Masataro Asai |
Abstract | In this report, we introduce an artificial dataset generator for Photo-realistic Blocksworld domain. Blocksworld is one of the oldest high-level task planning domain that is well defined but contains sufficient complexity, e.g., the conflicting subgoals and the decomposability into subproblems. We aim to make this dataset a benchmark for Neural-Symbolic integrated systems and accelerate the research in this area. The key advantage of such systems is the ability to obtain a symbolic model from the real-world input and perform a fast, systematic, complete algorithm for symbolic reasoning, without any supervision and the reward signal from the environment. |
Tasks | |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.01818v1 |
http://arxiv.org/pdf/1812.01818v1.pdf | |
PWC | https://paperswithcode.com/paper/photo-realistic-blocksworld-dataset |
Repo | https://github.com/ibm/photorealistic-blocksworld |
Framework | none |
Where are the Blobs: Counting by Localization with Point Supervision
Title | Where are the Blobs: Counting by Localization with Point Supervision |
Authors | Issam H. Laradji, Negar Rostamzadeh, Pedro O. Pinheiro, David Vazquez, Mark Schmidt |
Abstract | Object counting is an important task in computer vision due to its growing demand in applications such as surveillance, traffic monitoring, and counting everyday objects. State-of-the-art methods use regression-based optimization where they explicitly learn to count the objects of interest. These often perform better than detection-based methods that need to learn the more difficult task of predicting the location, size, and shape of each object. However, we propose a detection-based method that does not need to estimate the size and shape of the objects and that outperforms regression-based methods. Our contributions are three-fold: (1) we propose a novel loss function that encourages the network to output a single blob per object instance using point-level annotations only; (2) we design two methods for splitting large predicted blobs between object instances; and (3) we show that our method achieves new state-of-the-art results on several challenging datasets including the Pascal VOC and the Penguins dataset. Our method even outperforms those that use stronger supervision such as depth features, multi-point annotations, and bounding-box labels. |
Tasks | Object Counting |
Published | 2018-07-25 |
URL | http://arxiv.org/abs/1807.09856v1 |
http://arxiv.org/pdf/1807.09856v1.pdf | |
PWC | https://paperswithcode.com/paper/where-are-the-blobs-counting-by-localization |
Repo | https://github.com/ElementAI/LCFCN |
Framework | pytorch |
Improving Object Counting with Heatmap Regulation
Title | Improving Object Counting with Heatmap Regulation |
Authors | Shubhra Aich, Ian Stavness |
Abstract | In this paper, we propose a simple and effective way to improve one-look regression models for object counting from images. We use class activation map visualizations to illustrate the drawbacks of learning a pure one-look regression model for a counting task. Based on these insights, we enhance one-look regression counting models by regulating activation maps from the final convolution layer of the network with coarse ground-truth activation maps generated from simple dot annotations. We call this strategy heatmap regulation (HR). We show that this simple enhancement effectively suppresses false detections generated by the corresponding one-look baseline model and also improves the performance in terms of false negatives. Evaluations are performed on four different counting datasets — two for car counting (CARPK, PUCPR+), one for crowd counting (WorldExpo) and another for biological cell counting (VGG-Cells). Adding HR to a simple VGG front-end improves performance on all these benchmarks compared to a simple one-look baseline model and results in state-of-the-art performance for car counting. |
Tasks | Crowd Counting, Object Counting |
Published | 2018-03-14 |
URL | http://arxiv.org/abs/1803.05494v2 |
http://arxiv.org/pdf/1803.05494v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-object-counting-with-heatmap |
Repo | https://github.com/littleaich/heatmap-regulation |
Framework | pytorch |
Perceptual deep depth super-resolution
Title | Perceptual deep depth super-resolution |
Authors | Oleg Voynov, Alexey Artemov, Vage Egiazarian, Alexander Notchenko, Gleb Bobrovskikh, Denis Zorin, Evgeny Burnaev |
Abstract | RGBD images, combining high-resolution color and lower-resolution depth from various types of depth sensors, are increasingly common. One can significantly improve the resolution of depth maps by taking advantage of color information; deep learning methods make combining color and depth information particularly easy. However, fusing these two sources of data may lead to a variety of artifacts. If depth maps are used to reconstruct 3D shapes, e.g., for virtual reality applications, the visual quality of upsampled images is particularly important. The main idea of our approach is to measure the quality of depth map upsampling using renderings of resulting 3D surfaces. We demonstrate that a simple visual appearance-based loss, when used with either a trained CNN or simply a deep prior, yields significantly improved 3D shapes, as measured by a number of existing perceptual metrics. We compare this approach with a number of existing optimization and learning-based techniques. |
Tasks | Super-Resolution |
Published | 2018-12-24 |
URL | https://arxiv.org/abs/1812.09874v3 |
https://arxiv.org/pdf/1812.09874v3.pdf | |
PWC | https://paperswithcode.com/paper/perceptually-based-single-image-depth-super |
Repo | https://github.com/twhui/MSG-Net |
Framework | none |
DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation
Title | DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation |
Authors | Lex Fridman, Jack Terwilliger, Benedikt Jenik |
Abstract | We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the exploration and evaluation of deep Q-learning network variants and hyperparameter configurations through large-scale, open competition. This paper investigates the crowd-sourced hyperparameter tuning of the policy network that resulted from the first iteration of the DeepTraffic competition where thousands of participants actively searched through the hyperparameter space. |
Tasks | Autonomous Driving, Autonomous Navigation, Q-Learning |
Published | 2018-01-09 |
URL | http://arxiv.org/abs/1801.02805v2 |
http://arxiv.org/pdf/1801.02805v2.pdf | |
PWC | https://paperswithcode.com/paper/deeptraffic-crowdsourced-hyperparameter |
Repo | https://github.com/asarav/MIT-Deep-Traffic-Solution |
Framework | none |
Augmenting Neural Response Generation with Context-Aware Topical Attention
Title | Augmenting Neural Response Generation with Context-Aware Topical Attention |
Authors | Nouha Dziri, Ehsan Kamalloo, Kory W. Mathewson, Osmar Zaiane |
Abstract | Sequence-to-Sequence (Seq2Seq) models have witnessed a notable success in generating natural conversational exchanges. Notwithstanding the syntactically well-formed responses generated by these neural network models, they are prone to be acontextual, short and generic. In this work, we introduce a Topical Hierarchical Recurrent Encoder Decoder (THRED), a novel, fully data-driven, multi-turn response generation system intended to produce contextual and topic-aware responses. Our model is built upon the basic Seq2Seq model by augmenting it with a hierarchical joint attention mechanism that incorporates topical concepts and previous interactions into the response generation. To train our model, we provide a clean and high-quality conversational dataset mined from Reddit comments. We evaluate THRED on two novel automated metrics, dubbed Semantic Similarity and Response Echo Index, as well as with human evaluation. Our experiments demonstrate that the proposed model is able to generate more diverse and contextually relevant responses compared to the strong baselines. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2018-11-02 |
URL | https://arxiv.org/abs/1811.01063v2 |
https://arxiv.org/pdf/1811.01063v2.pdf | |
PWC | https://paperswithcode.com/paper/augmenting-neural-response-generation-with |
Repo | https://github.com/nouhadziri/THRED |
Framework | tf |
A high-bias, low-variance introduction to Machine Learning for physicists
Title | A high-bias, low-variance introduction to Machine Learning for physicists |
Authors | Pankaj Mehta, Marin Bukov, Ching-Hao Wang, Alexandre G. R. Day, Clint Richardson, Charles K. Fisher, David J. Schwab |
Abstract | Machine Learning (ML) is one of the most exciting and dynamic areas of modern research and application. The purpose of this review is to provide an introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to physicists. The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, generalization, and gradient descent before moving on to more advanced topics in both supervised and unsupervised learning. Topics covered in the review include ensemble models, deep learning and neural networks, clustering and data visualization, energy-based models (including MaxEnt models and Restricted Boltzmann Machines), and variational methods. Throughout, we emphasize the many natural connections between ML and statistical physics. A notable aspect of the review is the use of Python Jupyter notebooks to introduce modern ML/statistical packages to readers using physics-inspired datasets (the Ising Model and Monte-Carlo simulations of supersymmetric decays of proton-proton collisions). We conclude with an extended outlook discussing possible uses of machine learning for furthering our understanding of the physical world as well as open problems in ML where physicists may be able to contribute. (Notebooks are available at https://physics.bu.edu/~pankajm/MLnotebooks.html ) |
Tasks | |
Published | 2018-03-23 |
URL | https://arxiv.org/abs/1803.08823v3 |
https://arxiv.org/pdf/1803.08823v3.pdf | |
PWC | https://paperswithcode.com/paper/a-high-bias-low-variance-introduction-to |
Repo | https://github.com/vmartinezalvarez/Machine-Learning-for-physicists |
Framework | pytorch |
A Tutorial on Modeling and Inference in Undirected Graphical Models for Hyperspectral Image Analysis
Title | A Tutorial on Modeling and Inference in Undirected Graphical Models for Hyperspectral Image Analysis |
Authors | Utsav B. Gewali, Sildomar T. Monteiro |
Abstract | Undirected graphical models have been successfully used to jointly model the spatial and the spectral dependencies in earth observing hyperspectral images. They produce less noisy, smooth, and spatially coherent land cover maps and give top accuracies on many datasets. Moreover, they can easily be combined with other state-of-the-art approaches, such as deep learning. This has made them an essential tool for remote sensing researchers and practitioners. However, graphical models have not been easily accessible to the larger remote sensing community as they are not discussed in standard remote sensing textbooks and not included in the popular remote sensing software and toolboxes. In this tutorial, we provide a theoretical introduction to Markov random fields and conditional random fields based spatial-spectral classification for land cover mapping along with a detailed step-by-step practical guide on applying these methods using freely available software. Furthermore, the discussed methods are benchmarked on four public hyperspectral datasets for a fair comparison among themselves and easy comparison with the vast number of methods in literature which use the same datasets. The source code necessary to reproduce all the results in the paper is published on-line to make it easier for the readers to apply these techniques to different remote sensing problems. |
Tasks | |
Published | 2018-01-25 |
URL | http://arxiv.org/abs/1801.08268v1 |
http://arxiv.org/pdf/1801.08268v1.pdf | |
PWC | https://paperswithcode.com/paper/a-tutorial-on-modeling-and-inference-in |
Repo | https://github.com/UBGewali/tutorial-UGM-hyperspectral |
Framework | none |
Densely Connected Pyramid Dehazing Network
Title | Densely Connected Pyramid Dehazing Network |
Authors | He Zhang, Vishal M. Patel |
Abstract | We propose a new end-to-end single image dehazing method, called Densely Connected Pyramid Dehazing Network (DCPDN), which can jointly learn the transmission map, atmospheric light and dehazing all together. The end-to-end learning is achieved by directly embedding the atmospheric scattering model into the network, thereby ensuring that the proposed method strictly follows the physics-driven scattering model for dehazing. Inspired by the dense network that can maximize the information flow along features from different levels, we propose a new edge-preserving densely connected encoder-decoder structure with multi-level pyramid pooling module for estimating the transmission map. This network is optimized using a newly introduced edge-preserving loss function. To further incorporate the mutual structural information between the estimated transmission map and the dehazed result, we propose a joint-discriminator based on generative adversarial network framework to decide whether the corresponding dehazed image and the estimated transmission map are real or fake. An ablation study is conducted to demonstrate the effectiveness of each module evaluated at both estimated transmission map and dehazed result. Extensive experiments demonstrate that the proposed method achieves significant improvements over the state-of-the-art methods. Code will be made available at: https://github.com/hezhangsprinter |
Tasks | Image Dehazing, Single Image Dehazing |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08396v1 |
http://arxiv.org/pdf/1803.08396v1.pdf | |
PWC | https://paperswithcode.com/paper/densely-connected-pyramid-dehazing-network |
Repo | https://github.com/hezhangsprinter/DCPDN |
Framework | pytorch |
Image Generation from Scene Graphs
Title | Image Generation from Scene Graphs |
Authors | Justin Johnson, Agrim Gupta, Li Fei-Fei |
Abstract | To truly understand the visual world our models should be able not only to recognize images but also generate them. To this end, there has been exciting recent progress on generating images from natural language descriptions. These methods give stunning results on limited domains such as descriptions of birds or flowers, but struggle to faithfully reproduce complex sentences with many objects and relationships. To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships. Our model uses graph convolution to process input graphs, computes a scene layout by predicting bounding boxes and segmentation masks for objects, and converts the layout to an image with a cascaded refinement network. The network is trained adversarially against a pair of discriminators to ensure realistic outputs. We validate our approach on Visual Genome and COCO-Stuff, where qualitative results, ablations, and user studies demonstrate our method’s ability to generate complex images with multiple objects. |
Tasks | Image Generation, Layout-to-Image Generation |
Published | 2018-04-04 |
URL | http://arxiv.org/abs/1804.01622v1 |
http://arxiv.org/pdf/1804.01622v1.pdf | |
PWC | https://paperswithcode.com/paper/image-generation-from-scene-graphs |
Repo | https://github.com/google/sg2im |
Framework | pytorch |
Cycle-Dehaze: Enhanced CycleGAN for Single Image Dehazing
Title | Cycle-Dehaze: Enhanced CycleGAN for Single Image Dehazing |
Authors | Deniz Engin, Anıl Genç, Hazım Kemal Ekenel |
Abstract | In this paper, we present an end-to-end network, called Cycle-Dehaze, for single image dehazing problem, which does not require pairs of hazy and corresponding ground truth images for training. That is, we train the network by feeding clean and hazy images in an unpaired manner. Moreover, the proposed approach does not rely on estimation of the atmospheric scattering model parameters. Our method enhances CycleGAN formulation by combining cycle-consistency and perceptual losses in order to improve the quality of textural information recovery and generate visually better haze-free images. Typically, deep learning models for dehazing take low resolution images as input and produce low resolution outputs. However, in the NTIRE 2018 challenge on single image dehazing, high resolution images were provided. Therefore, we apply bicubic downscaling. After obtaining low-resolution outputs from the network, we utilize the Laplacian pyramid to upscale the output images to the original resolution. We conduct experiments on NYU-Depth, I-HAZE, and O-HAZE datasets. Extensive experiments demonstrate that the proposed approach improves CycleGAN method both quantitatively and qualitatively. |
Tasks | Image Dehazing, Single Image Dehazing |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.05308v1 |
http://arxiv.org/pdf/1805.05308v1.pdf | |
PWC | https://paperswithcode.com/paper/cycle-dehaze-enhanced-cyclegan-for-single |
Repo | https://github.com/engindeniz/Cycle-Dehaze |
Framework | tf |
Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network
Title | Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network |
Authors | Subeesh Vasu, Nimisha Thekke Madam, Rajagopalan A. N |
Abstract | Convolutional neural network (CNN) based methods have recently achieved great success for image super-resolution (SR). However, most deep CNN based SR models attempt to improve distortion measures (e.g. PSNR, SSIM, IFC, VIF) while resulting in poor quantified perceptual quality (e.g. human opinion score, no-reference quality measures such as NIQE). Few works have attempted to improve the perceptual quality at the cost of performance reduction in distortion measures. A very recent study has revealed that distortion and perceptual quality are at odds with each other and there is always a trade-off between the two. Often the restoration algorithms that are superior in terms of perceptual quality, are inferior in terms of distortion measures. Our work attempts to analyze the trade-off between distortion and perceptual quality for the problem of single image SR. To this end, we use the well-known SR architecture-enhanced deep super-resolution (EDSR) network and show that it can be adapted to achieve better perceptual quality for a specific range of the distortion measure. While the original network of EDSR was trained to minimize the error defined based on per-pixel accuracy alone, we train our network using a generative adversarial network framework with EDSR as the generator module. Our proposed network, called enhanced perceptual super-resolution network (EPSR), is trained with a combination of mean squared error loss, perceptual loss, and adversarial loss. Our experiments reveal that EPSR achieves the state-of-the-art trade-off between distortion and perceptual quality while the existing methods perform well in either of these measures alone. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00344v2 |
http://arxiv.org/pdf/1811.00344v2.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-perception-distortion-tradeoff |
Repo | https://github.com/subeeshvasu/2018_subeesh_epsr_eccvw |
Framework | pytorch |
End-to-End Multi-Task Learning with Attention
Title | End-to-End Multi-Task Learning with Attention |
Authors | Shikun Liu, Edward Johns, Andrew J. Davison |
Abstract | We propose a novel multi-task learning architecture, which allows learning of task-specific feature-level attention. Our design, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with a soft-attention module for each task. These modules allow for learning of task-specific features from the global features, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be trained end-to-end and can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. We evaluate our approach on a variety of datasets, across both image-to-image predictions and image classification tasks. We show that our architecture is state-of-the-art in multi-task learning compared to existing methods, and is also less sensitive to various weighting schemes in the multi-task loss function. Code is available at https://github.com/lorenmt/mtan. |
Tasks | Multi-Task Learning |
Published | 2018-03-28 |
URL | http://arxiv.org/abs/1803.10704v2 |
http://arxiv.org/pdf/1803.10704v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-multi-task-learning-with-attention |
Repo | https://github.com/lorenmt/mtan |
Framework | pytorch |
Conditional molecular design with deep generative models
Title | Conditional molecular design with deep generative models |
Authors | Seokho Kang, Kyunghyun Cho |
Abstract | Although machine learning has been successfully used to propose novel molecules that satisfy desired properties, it is still challenging to explore a large chemical space efficiently. In this paper, we present a conditional molecular design method that facilitates generating new molecules with desired properties. The proposed model, which simultaneously performs both property prediction and molecule generation, is built as a semi-supervised variational autoencoder trained on a set of existing molecules with only a partial annotation. We generate new molecules with desired properties by sampling from the generative distribution estimated by the model. We demonstrate the effectiveness of the proposed model by evaluating it on drug-like molecules. The model improves the performance of property prediction by exploiting unlabeled molecules, and efficiently generates novel molecules fulfilling various target conditions. |
Tasks | |
Published | 2018-04-30 |
URL | http://arxiv.org/abs/1805.00108v3 |
http://arxiv.org/pdf/1805.00108v3.pdf | |
PWC | https://paperswithcode.com/paper/conditional-molecular-design-with-deep |
Repo | https://github.com/gcolmenarejo/cmd |
Framework | tf |