July 30, 2019

3191 words 15 mins read

Paper Group AWR 5

Paper Group AWR 5

HDR image reconstruction from a single exposure using deep CNNs. Machine Learning for RealisticBall Detection in RoboCup SPL. Shifting Mean Activation Towards Zero with Bipolar Activation Functions. How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?. Log-DenseNet: How to Sparsify a DenseNet. Semantic3D.net: A n …

HDR image reconstruction from a single exposure using deep CNNs

Title HDR image reconstruction from a single exposure using deep CNNs
Authors Gabriel Eilertsen, Joel Kronander, Gyorgy Denes, Rafał K. Mantiuk, Jonas Unger
Abstract Camera sensors can only capture a limited range of luminance simultaneously, and in order to create high dynamic range (HDR) images a set of different exposures are typically combined. In this paper we address the problem of predicting information that have been lost in saturated image areas, in order to enable HDR reconstruction from a single exposure. We show that this problem is well-suited for deep learning algorithms, and propose a deep convolutional neural network (CNN) that is specifically designed taking into account the challenges in predicting HDR values. To train the CNN we gather a large dataset of HDR images, which we augment by simulating sensor saturation for a range of cameras. To further boost robustness, we pre-train the CNN on a simulated HDR dataset created from a subset of the MIT Places database. We demonstrate that our approach can reconstruct high-resolution visually convincing HDR results in a wide range of situations, and that it generalizes well to reconstruction of images captured with arbitrary and low-end cameras that use unknown camera response functions and post-processing. Furthermore, we compare to existing methods for HDR expansion, and show high quality results also for image based lighting. Finally, we evaluate the results in a subjective experiment performed on an HDR display. This shows that the reconstructed HDR images are visually convincing, with large improvements as compared to existing methods.
Tasks Image Reconstruction
Published 2017-10-20
URL http://arxiv.org/abs/1710.07480v1
PDF http://arxiv.org/pdf/1710.07480v1.pdf
PWC https://paperswithcode.com/paper/hdr-image-reconstruction-from-a-single
Repo https://github.com/gabrieleilertsen/hdrcnn
Framework tf

Machine Learning for RealisticBall Detection in RoboCup SPL

Title Machine Learning for RealisticBall Detection in RoboCup SPL
Authors Domenico Bloisi, Francesco Del Duchetto, Tiziano Manoni, Vincenzo Suriani
Abstract In this technical report, we describe the use of a machine learning approach for detecting the realistic black and white ball currently in use in the RoboCup Standard Platform League. Our aim is to provide a ready-to-use software module that can be useful for the RoboCup SPL community. To this end, the approach is integrated within the official B-Human code release 2016. The complete code for the approach presented in this work can be downloaded from the SPQR Team homepage at http://spqr.diag.uniroma1.it and from the SPQR Team GitHub repository at https://github.com/SPQRTeam/SPQRBallPerceptor. The approach has been tested in multiple environments, both indoor and outdoor. Furthermore, the ball detector described in this technical report has been used by the SPQR Robot Soccer Team during the competitions of the Robocup German Open 2017. To facilitate the use of our code by other teams, we have prepared a step-by-step installation guide.
Tasks
Published 2017-07-12
URL http://arxiv.org/abs/1707.03628v1
PDF http://arxiv.org/pdf/1707.03628v1.pdf
PWC https://paperswithcode.com/paper/machine-learning-for-realisticball-detection
Repo https://github.com/SPQRTeam/SPQRBallPerceptor
Framework none

Shifting Mean Activation Towards Zero with Bipolar Activation Functions

Title Shifting Mean Activation Towards Zero with Bipolar Activation Functions
Authors Lars Eidnes, Arild Nøkland
Abstract We propose a simple extension to the ReLU-family of activation functions that allows them to shift the mean activation across a layer towards zero. Combined with proper weight initialization, this alleviates the need for normalization layers. We explore the training of deep vanilla recurrent neural networks (RNNs) with up to 144 layers, and show that bipolar activation functions help learning in this setting. On the Penn Treebank and Text8 language modeling tasks we obtain competitive results, improving on the best reported results for non-gated networks. In experiments with convolutional neural networks without batch normalization, we find that bipolar activations produce a faster drop in training error, and results in a lower test error on the CIFAR-10 classification task.
Tasks Language Modelling
Published 2017-09-12
URL http://arxiv.org/abs/1709.04054v3
PDF http://arxiv.org/pdf/1709.04054v3.pdf
PWC https://paperswithcode.com/paper/shifting-mean-activation-towards-zero-with
Repo https://github.com/larspars/word-rnn
Framework torch

How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?

Title How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?
Authors Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas, Nathan Baker
Abstract The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of molecular drawings. In this work, we investigate the effects of systematically removing and adding localized domain-specific information to the image channels of the training data. By augmenting images with only 3 additional basic information, and without introducing any architectural changes, we demonstrate that an augmented Chemception (AugChemception) outperforms the original model in the prediction of toxicity, activity, and solvation free energy. Then, by altering the information content in the images, and examining the resulting model’s performance, we also identify two distinct learning patterns in predicting toxicity/activity as compared to solvation free energy. These patterns suggest that Chemception is learning about its tasks in the manner that is consistent with established knowledge. Thus, our work demonstrates that advanced chemical knowledge is not a pre-requisite for deep learning models to accurately predict complex chemical properties.
Tasks Representation Learning
Published 2017-10-05
URL http://arxiv.org/abs/1710.02238v2
PDF http://arxiv.org/pdf/1710.02238v2.pdf
PWC https://paperswithcode.com/paper/how-much-chemistry-does-a-deep-neural-network
Repo https://github.com/Bunseki2/DeepL
Framework none

Log-DenseNet: How to Sparsify a DenseNet

Title Log-DenseNet: How to Sparsify a DenseNet
Authors Hanzhang Hu, Debadeepta Dey, Allison Del Giorno, Martial Hebert, J. Andrew Bagnell
Abstract Skip connections are increasingly utilized by deep neural networks to improve accuracy and cost-efficiency. In particular, the recent DenseNet is efficient in computation and parameters, and achieves state-of-the-art predictions by directly connecting each feature layer to all previous ones. However, DenseNet’s extreme connectivity pattern may hinder its scalability to high depths, and in applications like fully convolutional networks, full DenseNet connections are prohibitively expensive. This work first experimentally shows that one key advantage of skip connections is to have short distances among feature layers during backpropagation. Specifically, using a fixed number of skip connections, the connection patterns with shorter backpropagation distance among layers have more accurate predictions. Following this insight, we propose a connection template, Log-DenseNet, which, in comparison to DenseNet, only slightly increases the backpropagation distances among layers from 1 to ($1 + \log_2 L$), but uses only $L\log_2 L$ total connections instead of $O(L^2)$. Hence, Log-DenseNets are easier than DenseNets to implement and to scale. We demonstrate the effectiveness of our design principle by showing better performance than DenseNets on tabula rasa semantic segmentation, and competitive results on visual recognition.
Tasks Semantic Segmentation
Published 2017-10-30
URL http://arxiv.org/abs/1711.00002v1
PDF http://arxiv.org/pdf/1711.00002v1.pdf
PWC https://paperswithcode.com/paper/log-densenet-how-to-sparsify-a-densenet
Repo https://github.com/agassi4013/Log-DenseNet-Tensorflow
Framework tf

Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

Title Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark
Authors Timo Hackel, Nikolay Savinov, Lubor Ladicky, Jan D. Wegner, Konrad Schindler, Marc Pollefeys
Abstract This paper presents a new 3D point cloud classification benchmark data set with over four billion manually labelled points, meant as input for data-hungry (deep) learning methods. We also discuss first submissions to the benchmark that use deep convolutional neural networks (CNNs) as a work horse, which already show remarkable performance improvements over state-of-the-art. CNNs have become the de-facto standard for many tasks in computer vision and machine learning like semantic segmentation or object detection in images, but have no yet led to a true breakthrough for 3D point cloud labelling tasks due to lack of training data. With the massive data set presented in this paper, we aim at closing this data gap to help unleash the full potential of deep learning methods for 3D labelling tasks. Our semantic3D.net data set consists of dense point clouds acquired with static terrestrial laser scanners. It contains 8 semantic classes and covers a wide range of urban outdoor scenes: churches, streets, railroad tracks, squares, villages, soccer fields and castles. We describe our labelling interface and show that our data set provides more dense and complete point clouds with much higher overall number of labelled points compared to those already available to the research community. We further provide baseline method descriptions and comparison between methods submitted to our online system. We hope semantic3D.net will pave the way for deep learning methods in 3D point cloud labelling to learn richer, more general 3D representations, and first submissions after only a few months indicate that this might indeed be the case.
Tasks Object Detection, Semantic Segmentation
Published 2017-04-12
URL http://arxiv.org/abs/1704.03847v1
PDF http://arxiv.org/pdf/1704.03847v1.pdf
PWC https://paperswithcode.com/paper/semantic3dnet-a-new-large-scale-point-cloud
Repo https://github.com/nsavinov/semantic3dnet
Framework torch

EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification

Title EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
Authors Patrick Helber, Benjamin Bischke, Andreas Dengel, Damian Borth
Abstract In this paper, we address the challenge of land use and land cover classification using Sentinel-2 satellite images. The Sentinel-2 satellite images are openly and freely accessible provided in the Earth observation program Copernicus. We present a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes with in total 27,000 labeled and geo-referenced images. We provide benchmarks for this novel dataset with its spectral bands using state-of-the-art deep Convolutional Neural Network (CNNs). With the proposed novel dataset, we achieved an overall classification accuracy of 98.57%. The resulting classification system opens a gate towards a number of Earth observation applications. We demonstrate how this classification system can be used for detecting land use and land cover changes and how it can assist in improving geographical maps. The geo-referenced dataset EuroSAT is made publicly available at https://github.com/phelber/eurosat.
Tasks
Published 2017-08-31
URL http://arxiv.org/abs/1709.00029v2
PDF http://arxiv.org/pdf/1709.00029v2.pdf
PWC https://paperswithcode.com/paper/eurosat-a-novel-dataset-and-deep-learning
Repo https://github.com/pashu123/Satellite-Data-Analysis
Framework none

On orthogonality and learning recurrent networks with long term dependencies

Title On orthogonality and learning recurrent networks with long term dependencies
Authors Eugene Vorontsov, Chiheb Trabelsi, Samuel Kadoury, Chris Pal
Abstract It is well known that it is challenging to train deep neural networks and recurrent neural networks for tasks that exhibit long term dependencies. The vanishing or exploding gradient problem is a well known issue associated with these challenges. One approach to addressing vanishing and exploding gradients is to use either soft or hard constraints on weight matrices so as to encourage or enforce orthogonality. Orthogonal matrices preserve gradient norm during backpropagation and may therefore be a desirable property. This paper explores issues with optimization convergence, speed and gradient stability when encouraging or enforcing orthogonality. To perform this analysis, we propose a weight matrix factorization and parameterization strategy through which we can bound matrix norms and therein control the degree of expansivity induced during backpropagation. We find that hard constraints on orthogonality can negatively affect the speed of convergence and model performance.
Tasks
Published 2017-01-31
URL http://arxiv.org/abs/1702.00071v4
PDF http://arxiv.org/pdf/1702.00071v4.pdf
PWC https://paperswithcode.com/paper/on-orthogonality-and-learning-recurrent
Repo https://github.com/veugene/spectre_release
Framework none

Autoregressive Convolutional Neural Networks for Asynchronous Time Series

Title Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Authors Mikołaj Bińkowski, Gautier Marti, Philippe Donnat
Abstract We propose Significance-Offset Convolutional Neural Network, a deep convolutional network architecture for regression of multivariate asynchronous time series. The model is inspired by standard autoregressive (AR) models and gating mechanisms used in recurrent neural networks. It involves an AR-like weighting system, where the final predictor is obtained as a weighted sum of adjusted regressors, while the weights are datadependent functions learnt through a convolutional network. The architecture was designed for applications on asynchronous time series and is evaluated on such datasets: a hedge fund proprietary dataset of over 2 million quotes for a credit derivative index, an artificially generated noisy autoregressive series and UCI household electricity consumption dataset. The proposed architecture achieves promising results as compared to convolutional and recurrent neural networks.
Tasks Time Series
Published 2017-03-12
URL http://arxiv.org/abs/1703.04122v4
PDF http://arxiv.org/pdf/1703.04122v4.pdf
PWC https://paperswithcode.com/paper/autoregressive-convolutional-neural-networks
Repo https://github.com/Fangyh09/Autoregressive-Convolutional-Neural-Networks.Pytorch
Framework pytorch

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

Title CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
Authors Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Y. Ng
Abstract We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists. Our algorithm, CheXNet, is a 121-layer convolutional neural network trained on ChestX-ray14, currently the largest publicly available chest X-ray dataset, containing over 100,000 frontal-view X-ray images with 14 diseases. Four practicing academic radiologists annotate a test set, on which we compare the performance of CheXNet to that of radiologists. We find that CheXNet exceeds average radiologist performance on the F1 metric. We extend CheXNet to detect all 14 diseases in ChestX-ray14 and achieve state of the art results on all 14 diseases.
Tasks Pneumonia Detection
Published 2017-11-14
URL http://arxiv.org/abs/1711.05225v3
PDF http://arxiv.org/pdf/1711.05225v3.pdf
PWC https://paperswithcode.com/paper/chexnet-radiologist-level-pneumonia-detection
Repo https://github.com/zoogzog/chexnet
Framework pytorch

Accelerated Alternating Projections for Robust Principal Component Analysis

Title Accelerated Alternating Projections for Robust Principal Component Analysis
Authors HanQin Cai, Jian-Feng Cai, Ke Wei
Abstract We study robust PCA for the fully observed setting, which is about separating a low rank matrix $\boldsymbol{L}$ and a sparse matrix $\boldsymbol{S}$ from their sum $\boldsymbol{D}=\boldsymbol{L}+\boldsymbol{S}$. In this paper, a new algorithm, dubbed accelerated alternating projections, is introduced for robust PCA which significantly improves the computational efficiency of the existing alternating projections proposed in [Netrapalli, Praneeth, et al., 2014] when updating the low rank factor. The acceleration is achieved by first projecting a matrix onto some low dimensional subspace before obtaining a new estimate of the low rank matrix via truncated SVD. Exact recovery guarantee has been established which shows linear convergence of the proposed algorithm. Empirical performance evaluations establish the advantage of our algorithm over other state-of-the-art algorithms for robust PCA.
Tasks
Published 2017-11-15
URL http://arxiv.org/abs/1711.05519v4
PDF http://arxiv.org/pdf/1711.05519v4.pdf
PWC https://paperswithcode.com/paper/accelerated-alternating-projections-for
Repo https://github.com/caesarcai/AccAltProj_for_RPCA
Framework none

Almost instant brain atlas segmentation for large-scale studies

Title Almost instant brain atlas segmentation for large-scale studies
Authors Alex Fedorov, Eswar Damaraju, Vince Calhoun, Sergey Plis
Abstract Large scale studies of group differences in healthy controls and patients and screenings for early stage disease prevention programs require processing and analysis of extensive multisubject datasets. Complexity of the task increases even further when segmenting structural MRI of the brain into an atlas with more than 50 regions. Current automatic approaches are time-consuming and hardly scalable; they often involve many error prone intermediate steps and don’t utilize other available modalities. To alleviate these problems, we propose a feedforward fully convolutional neural network trained on the output produced by the state of the art models. Incredible speed due to available powerful GPUs neural network makes this analysis much easier and faster (from $>10$ hours to a minute). The proposed model is more than two orders of magnitudes faster than the state of the art and yet as accurate. We have evaluated the network’s performance by comparing it with the state of the art in the task of differentiating region volumes of healthy controls and patients with schizophrenia on a dataset with 311 subjects. This comparison provides a strong evidence that speed did not harm the accuracy. The overall quality may also be increased by utilizing multi-modal datasets (not an easy task for other models) by simple adding more modalities as an input. Our model will be useful in large-scale studies as well as in clinical care solutions, where it can significantly reduce delay between the patient screening and the result.
Tasks
Published 2017-11-01
URL http://arxiv.org/abs/1711.00457v1
PDF http://arxiv.org/pdf/1711.00457v1.pdf
PWC https://paperswithcode.com/paper/almost-instant-brain-atlas-segmentation-for
Repo https://github.com/Entodi/meshnet-pytorch
Framework pytorch

Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension

Title Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
Authors David Golub, Po-Sen Huang, Xiaodong He, Li Deng
Abstract We develop a technique for transfer learning in machine comprehension (MC) using a novel two-stage synthesis network (SynNet). Given a high-performing MC model in one domain, our technique aims to answer questions about documents in another domain, where we use no labeled data of question-answer pairs. Using the proposed SynNet with a pretrained model from the SQuAD dataset on the challenging NewsQA dataset, we achieve an F1 measure of 44.3% with a single model and 46.6% with an ensemble, approaching performance of in-domain models (F1 measure of 50.0%) and outperforming the out-of-domain baseline of 7.6%, without use of provided annotations.
Tasks Reading Comprehension, Transfer Learning
Published 2017-06-29
URL http://arxiv.org/abs/1706.09789v3
PDF http://arxiv.org/pdf/1706.09789v3.pdf
PWC https://paperswithcode.com/paper/two-stage-synthesis-networks-for-transfer
Repo https://github.com/davidgolub/QuestionGeneration
Framework pytorch

CoupleNet: Coupling Global Structure with Local Parts for Object Detection

Title CoupleNet: Coupling Global Structure with Local Parts for Object Detection
Authors Yousong Zhu, Chaoyang Zhao, Jinqiao Wang, Xu Zhao, Yi Wu, Hanqing Lu
Abstract The region-based Convolutional Neural Network (CNN) detectors such as Faster R-CNN or R-FCN have already shown promising results for object detection by combining the region proposal subnetwork and the classification subnetwork together. Although R-FCN has achieved higher detection speed while keeping the detection performance, the global structure information is ignored by the position-sensitive score maps. To fully explore the local and global properties, in this paper, we propose a novel fully convolutional network, named as CoupleNet, to couple the global structure with local parts for object detection. Specifically, the object proposals obtained by the Region Proposal Network (RPN) are fed into the the coupling module which consists of two branches. One branch adopts the position-sensitive RoI (PSRoI) pooling to capture the local part information of the object, while the other employs the RoI pooling to encode the global and context information. Next, we design different coupling strategies and normalization ways to make full use of the complementary advantages between the global and local branches. Extensive experiments demonstrate the effectiveness of our approach. We achieve state-of-the-art results on all three challenging datasets, i.e. a mAP of 82.7% on VOC07, 80.4% on VOC12, and 34.4% on COCO. Codes will be made publicly available.
Tasks Object Detection
Published 2017-08-09
URL http://arxiv.org/abs/1708.02863v1
PDF http://arxiv.org/pdf/1708.02863v1.pdf
PWC https://paperswithcode.com/paper/couplenet-coupling-global-structure-with
Repo https://github.com/tshizys/CoupleNet
Framework caffe2

Contextually Customized Video Summaries via Natural Language

Title Contextually Customized Video Summaries via Natural Language
Authors Jinsoo Choi, Tae-Hyun Oh, In So Kweon
Abstract The best summary of a long video differs among different people due to its highly subjective nature. Even for the same person, the best summary may change with time or mood. In this paper, we introduce the task of generating customized video summaries through simple text. First, we train a deep architecture to effectively learn semantic embeddings of video frames by leveraging the abundance of image-caption data via a progressive and residual manner. Given a user-specific text description, our algorithm is able to select semantically relevant video segments and produce a temporally aligned video summary. In order to evaluate our textually customized video summaries, we conduct experimental comparison with baseline methods that utilize ground-truth information. Despite the challenging baselines, our method still manages to show comparable or even exceeding performance. We also show that our method is able to generate semantically diverse video summaries by only utilizing the learned visual embeddings.
Tasks
Published 2017-02-06
URL http://arxiv.org/abs/1702.01528v3
PDF http://arxiv.org/pdf/1702.01528v3.pdf
PWC https://paperswithcode.com/paper/contextually-customized-video-summaries-via
Repo https://github.com/jinsc37/Page-VidSumm
Framework none
comments powered by Disqus