July 30, 2019

3191 words 15 mins read

Paper Group AWR 5

HDR image reconstruction from a single exposure using deep CNNs. Machine Learning for RealisticBall Detection in RoboCup SPL. Shifting Mean Activation Towards Zero with Bipolar Activation Functions. How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?. Log-DenseNet: How to Sparsify a DenseNet. Semantic3D.net: A n …

HDR image reconstruction from a single exposure using deep CNNs


Title	HDR image reconstruction from a single exposure using deep CNNs
Authors	Gabriel Eilertsen, Joel Kronander, Gyorgy Denes, Rafał K. Mantiuk, Jonas Unger
Abstract	Camera sensors can only capture a limited range of luminance simultaneously, and in order to create high dynamic range (HDR) images a set of different exposures are typically combined. In this paper we address the problem of predicting information that have been lost in saturated image areas, in order to enable HDR reconstruction from a single exposure. We show that this problem is well-suited for deep learning algorithms, and propose a deep convolutional neural network (CNN) that is specifically designed taking into account the challenges in predicting HDR values. To train the CNN we gather a large dataset of HDR images, which we augment by simulating sensor saturation for a range of cameras. To further boost robustness, we pre-train the CNN on a simulated HDR dataset created from a subset of the MIT Places database. We demonstrate that our approach can reconstruct high-resolution visually convincing HDR results in a wide range of situations, and that it generalizes well to reconstruction of images captured with arbitrary and low-end cameras that use unknown camera response functions and post-processing. Furthermore, we compare to existing methods for HDR expansion, and show high quality results also for image based lighting. Finally, we evaluate the results in a subjective experiment performed on an HDR display. This shows that the reconstructed HDR images are visually convincing, with large improvements as compared to existing methods.
Tasks	Image Reconstruction
Published	2017-10-20
URL	http://arxiv.org/abs/1710.07480v1
PDF	http://arxiv.org/pdf/1710.07480v1.pdf
PWC	https://paperswithcode.com/paper/hdr-image-reconstruction-from-a-single
Repo	https://github.com/gabrieleilertsen/hdrcnn
Framework	tf

Machine Learning for RealisticBall Detection in RoboCup SPL


Title	Machine Learning for RealisticBall Detection in RoboCup SPL
Authors	Domenico Bloisi, Francesco Del Duchetto, Tiziano Manoni, Vincenzo Suriani
Abstract	In this technical report, we describe the use of a machine learning approach for detecting the realistic black and white ball currently in use in the RoboCup Standard Platform League. Our aim is to provide a ready-to-use software module that can be useful for the RoboCup SPL community. To this end, the approach is integrated within the official B-Human code release 2016. The complete code for the approach presented in this work can be downloaded from the SPQR Team homepage at http://spqr.diag.uniroma1.it and from the SPQR Team GitHub repository at https://github.com/SPQRTeam/SPQRBallPerceptor. The approach has been tested in multiple environments, both indoor and outdoor. Furthermore, the ball detector described in this technical report has been used by the SPQR Robot Soccer Team during the competitions of the Robocup German Open 2017. To facilitate the use of our code by other teams, we have prepared a step-by-step installation guide.
Tasks
Published	2017-07-12
URL	http://arxiv.org/abs/1707.03628v1
PDF	http://arxiv.org/pdf/1707.03628v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-for-realisticball-detection
Repo	https://github.com/SPQRTeam/SPQRBallPerceptor
Framework	none

Shifting Mean Activation Towards Zero with Bipolar Activation Functions


Title	Shifting Mean Activation Towards Zero with Bipolar Activation Functions
Authors	Lars Eidnes, Arild Nøkland
Abstract	We propose a simple extension to the ReLU-family of activation functions that allows them to shift the mean activation across a layer towards zero. Combined with proper weight initialization, this alleviates the need for normalization layers. We explore the training of deep vanilla recurrent neural networks (RNNs) with up to 144 layers, and show that bipolar activation functions help learning in this setting. On the Penn Treebank and Text8 language modeling tasks we obtain competitive results, improving on the best reported results for non-gated networks. In experiments with convolutional neural networks without batch normalization, we find that bipolar activations produce a faster drop in training error, and results in a lower test error on the CIFAR-10 classification task.
Tasks	Language Modelling
Published	2017-09-12
URL	http://arxiv.org/abs/1709.04054v3
PDF	http://arxiv.org/pdf/1709.04054v3.pdf
PWC	https://paperswithcode.com/paper/shifting-mean-activation-towards-zero-with
Repo	https://github.com/larspars/word-rnn
Framework	torch

How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?


Title	How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?
Authors	Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas, Nathan Baker
Abstract	The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of molecular drawings. In this work, we investigate the effects of systematically removing and adding localized domain-specific information to the image channels of the training data. By augmenting images with only 3 additional basic information, and without introducing any architectural changes, we demonstrate that an augmented Chemception (AugChemception) outperforms the original model in the prediction of toxicity, activity, and solvation free energy. Then, by altering the information content in the images, and examining the resulting model’s performance, we also identify two distinct learning patterns in predicting toxicity/activity as compared to solvation free energy. These patterns suggest that Chemception is learning about its tasks in the manner that is consistent with established knowledge. Thus, our work demonstrates that advanced chemical knowledge is not a pre-requisite for deep learning models to accurately predict complex chemical properties.
Tasks	Representation Learning
Published	2017-10-05
URL	http://arxiv.org/abs/1710.02238v2
PDF	http://arxiv.org/pdf/1710.02238v2.pdf
PWC	https://paperswithcode.com/paper/how-much-chemistry-does-a-deep-neural-network
Repo	https://github.com/Bunseki2/DeepL
Framework	none

Log-DenseNet: How to Sparsify a DenseNet


Title	Log-DenseNet: How to Sparsify a DenseNet
Authors	Hanzhang Hu, Debadeepta Dey, Allison Del Giorno, Martial Hebert, J. Andrew Bagnell
Abstract	Skip connections are increasingly utilized by deep neural networks to improve accuracy and cost-efficiency. In particular, the recent DenseNet is efficient in computation and parameters, and achieves state-of-the-art predictions by directly connecting each feature layer to all previous ones. However, DenseNet’s extreme connectivity pattern may hinder its scalability to high depths, and in applications like fully convolutional networks, full DenseNet connections are prohibitively expensive. This work first experimentally shows that one key advantage of skip connections is to have short distances among feature layers during backpropagation. Specifically, using a fixed number of skip connections, the connection patterns with shorter backpropagation distance among layers have more accurate predictions. Following this insight, we propose a connection template, Log-DenseNet, which, in comparison to DenseNet, only slightly increases the backpropagation distances among layers from 1 to ($1 + \log_2 L$), but uses only $L\log_2 L$ total connections instead of $O(L^2)$. Hence, Log-DenseNets are easier than DenseNets to implement and to scale. We demonstrate the effectiveness of our design principle by showing better performance than DenseNets on tabula rasa semantic segmentation, and competitive results on visual recognition.
Tasks	Semantic Segmentation
Published	2017-10-30
URL	http://arxiv.org/abs/1711.00002v1
PDF	http://arxiv.org/pdf/1711.00002v1.pdf
PWC	https://paperswithcode.com/paper/log-densenet-how-to-sparsify-a-densenet
Repo	https://github.com/agassi4013/Log-DenseNet-Tensorflow
Framework	tf

Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark


Title	Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark
Authors	Timo Hackel, Nikolay Savinov, Lubor Ladicky, Jan D. Wegner, Konrad Schindler, Marc Pollefeys
Abstract	This paper presents a new 3D point cloud classification benchmark data set with over four billion manually labelled points, meant as input for data-hungry (deep) learning methods. We also discuss first submissions to the benchmark that use deep convolutional neural networks (CNNs) as a work horse, which already show remarkable performance improvements over state-of-the-art. CNNs have become the de-facto standard for many tasks in computer vision and machine learning like semantic segmentation or object detection in images, but have no yet led to a true breakthrough for 3D point cloud labelling tasks due to lack of training data. With the massive data set presented in this paper, we aim at closing this data gap to help unleash the full potential of deep learning methods for 3D labelling tasks. Our semantic3D.net data set consists of dense point clouds acquired with static terrestrial laser scanners. It contains 8 semantic classes and covers a wide range of urban outdoor scenes: churches, streets, railroad tracks, squares, villages, soccer fields and castles. We describe our labelling interface and show that our data set provides more dense and complete point clouds with much higher overall number of labelled points compared to those already available to the research community. We further provide baseline method descriptions and comparison between methods submitted to our online system. We hope semantic3D.net will pave the way for deep learning methods in 3D point cloud labelling to learn richer, more general 3D representations, and first submissions after only a few months indicate that this might indeed be the case.
Tasks	Object Detection, Semantic Segmentation
Published	2017-04-12
URL	http://arxiv.org/abs/1704.03847v1
PDF	http://arxiv.org/pdf/1704.03847v1.pdf
PWC	https://paperswithcode.com/paper/semantic3dnet-a-new-large-scale-point-cloud
Repo	https://github.com/nsavinov/semantic3dnet
Framework	torch

EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification


Title	EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
Authors	Patrick Helber, Benjamin Bischke, Andreas Dengel, Damian Borth
Abstract	In this paper, we address the challenge of land use and land cover classification using Sentinel-2 satellite images. The Sentinel-2 satellite images are openly and freely accessible provided in the Earth observation program Copernicus. We present a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes with in total 27,000 labeled and geo-referenced images. We provide benchmarks for this novel dataset with its spectral bands using state-of-the-art deep Convolutional Neural Network (CNNs). With the proposed novel dataset, we achieved an overall classification accuracy of 98.57%. The resulting classification system opens a gate towards a number of Earth observation applications. We demonstrate how this classification system can be used for detecting land use and land cover changes and how it can assist in improving geographical maps. The geo-referenced dataset EuroSAT is made publicly available at https://github.com/phelber/eurosat.
Tasks
Published	2017-08-31
URL	http://arxiv.org/abs/1709.00029v2
PDF	http://arxiv.org/pdf/1709.00029v2.pdf
PWC	https://paperswithcode.com/paper/eurosat-a-novel-dataset-and-deep-learning
Repo	https://github.com/pashu123/Satellite-Data-Analysis
Framework	none

On orthogonality and learning recurrent networks with long term dependencies


Title	On orthogonality and learning recurrent networks with long term dependencies
Authors	Eugene Vorontsov, Chiheb Trabelsi, Samuel Kadoury, Chris Pal
Abstract	It is well known that it is challenging to train deep neural networks and recurrent neural networks for tasks that exhibit long term dependencies. The vanishing or exploding gradient problem is a well known issue associated with these challenges. One approach to addressing vanishing and exploding gradients is to use either soft or hard constraints on weight matrices so as to encourage or enforce orthogonality. Orthogonal matrices preserve gradient norm during backpropagation and may therefore be a desirable property. This paper explores issues with optimization convergence, speed and gradient stability when encouraging or enforcing orthogonality. To perform this analysis, we propose a weight matrix factorization and parameterization strategy through which we can bound matrix norms and therein control the degree of expansivity induced during backpropagation. We find that hard constraints on orthogonality can negatively affect the speed of convergence and model performance.
Tasks
Published	2017-01-31
URL	http://arxiv.org/abs/1702.00071v4
PDF	http://arxiv.org/pdf/1702.00071v4.pdf
PWC	https://paperswithcode.com/paper/on-orthogonality-and-learning-recurrent
Repo	https://github.com/veugene/spectre_release
Framework	none

Autoregressive Convolutional Neural Networks for Asynchronous Time Series


Title	Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Authors	Mikołaj Bińkowski, Gautier Marti, Philippe Donnat
Abstract	We propose Significance-Offset Convolutional Neural Network, a deep convolutional network architecture for regression of multivariate asynchronous time series. The model is inspired by standard autoregressive (AR) models and gating mechanisms used in recurrent neural networks. It involves an AR-like weighting system, where the final predictor is obtained as a weighted sum of adjusted regressors, while the weights are datadependent functions learnt through a convolutional network. The architecture was designed for applications on asynchronous time series and is evaluated on such datasets: a hedge fund proprietary dataset of over 2 million quotes for a credit derivative index, an artificially generated noisy autoregressive series and UCI household electricity consumption dataset. The proposed architecture achieves promising results as compared to convolutional and recurrent neural networks.
Tasks	Time Series
Published	2017-03-12
URL	http://arxiv.org/abs/1703.04122v4
PDF	http://arxiv.org/pdf/1703.04122v4.pdf
PWC	https://paperswithcode.com/paper/autoregressive-convolutional-neural-networks
Repo	https://github.com/Fangyh09/Autoregressive-Convolutional-Neural-Networks.Pytorch
Framework	pytorch

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning


Title	CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
Authors	Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Y. Ng
Abstract	We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists. Our algorithm, CheXNet, is a 121-layer convolutional neural network trained on ChestX-ray14, currently the largest publicly available chest X-ray dataset, containing over 100,000 frontal-view X-ray images with 14 diseases. Four practicing academic radiologists annotate a test set, on which we compare the performance of CheXNet to that of radiologists. We find that CheXNet exceeds average radiologist performance on the F1 metric. We extend CheXNet to detect all 14 diseases in ChestX-ray14 and achieve state of the art results on all 14 diseases.
Tasks	Pneumonia Detection
Published	2017-11-14
URL	http://arxiv.org/abs/1711.05225v3
PDF	http://arxiv.org/pdf/1711.05225v3.pdf
PWC	https://paperswithcode.com/paper/chexnet-radiologist-level-pneumonia-detection
Repo	https://github.com/zoogzog/chexnet
Framework	pytorch

Accelerated Alternating Projections for Robust Principal Component Analysis


Title	Accelerated Alternating Projections for Robust Principal Component Analysis
Authors	HanQin Cai, Jian-Feng Cai, Ke Wei
Abstract	We study robust PCA for the fully observed setting, which is about separating a low rank matrix $\boldsymbol{L}$ and a sparse matrix $\boldsymbol{S}$ from their sum $\boldsymbol{D}=\boldsymbol{L}+\boldsymbol{S}$. In this paper, a new algorithm, dubbed accelerated alternating projections, is introduced for robust PCA which significantly improves the computational efficiency of the existing alternating projections proposed in [Netrapalli, Praneeth, et al., 2014] when updating the low rank factor. The acceleration is achieved by first projecting a matrix onto some low dimensional subspace before obtaining a new estimate of the low rank matrix via truncated SVD. Exact recovery guarantee has been established which shows linear convergence of the proposed algorithm. Empirical performance evaluations establish the advantage of our algorithm over other state-of-the-art algorithms for robust PCA.
Tasks
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05519v4
PDF	http://arxiv.org/pdf/1711.05519v4.pdf
PWC	https://paperswithcode.com/paper/accelerated-alternating-projections-for
Repo	https://github.com/caesarcai/AccAltProj_for_RPCA
Framework	none

Almost instant brain atlas segmentation for large-scale studies


Title	Almost instant brain atlas segmentation for large-scale studies
Authors	Alex Fedorov, Eswar Damaraju, Vince Calhoun, Sergey Plis
Abstract	Large scale studies of group differences in healthy controls and patients and screenings for early stage disease prevention programs require processing and analysis of extensive multisubject datasets. Complexity of the task increases even further when segmenting structural MRI of the brain into an atlas with more than 50 regions. Current automatic approaches are time-consuming and hardly scalable; they often involve many error prone intermediate steps and don’t utilize other available modalities. To alleviate these problems, we propose a feedforward fully convolutional neural network trained on the output produced by the state of the art models. Incredible speed due to available powerful GPUs neural network makes this analysis much easier and faster (from $>10$ hours to a minute). The proposed model is more than two orders of magnitudes faster than the state of the art and yet as accurate. We have evaluated the network’s performance by comparing it with the state of the art in the task of differentiating region volumes of healthy controls and patients with schizophrenia on a dataset with 311 subjects. This comparison provides a strong evidence that speed did not harm the accuracy. The overall quality may also be increased by utilizing multi-modal datasets (not an easy task for other models) by simple adding more modalities as an input. Our model will be useful in large-scale studies as well as in clinical care solutions, where it can significantly reduce delay between the patient screening and the result.
Tasks
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00457v1
PDF	http://arxiv.org/pdf/1711.00457v1.pdf
PWC	https://paperswithcode.com/paper/almost-instant-brain-atlas-segmentation-for
Repo	https://github.com/Entodi/meshnet-pytorch
Framework	pytorch

Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension


Title	Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
Authors	David Golub, Po-Sen Huang, Xiaodong He, Li Deng
Abstract	We develop a technique for transfer learning in machine comprehension (MC) using a novel two-stage synthesis network (SynNet). Given a high-performing MC model in one domain, our technique aims to answer questions about documents in another domain, where we use no labeled data of question-answer pairs. Using the proposed SynNet with a pretrained model from the SQuAD dataset on the challenging NewsQA dataset, we achieve an F1 measure of 44.3% with a single model and 46.6% with an ensemble, approaching performance of in-domain models (F1 measure of 50.0%) and outperforming the out-of-domain baseline of 7.6%, without use of provided annotations.
Tasks	Reading Comprehension, Transfer Learning
Published	2017-06-29
URL	http://arxiv.org/abs/1706.09789v3
PDF	http://arxiv.org/pdf/1706.09789v3.pdf
PWC	https://paperswithcode.com/paper/two-stage-synthesis-networks-for-transfer
Repo	https://github.com/davidgolub/QuestionGeneration
Framework	pytorch

CoupleNet: Coupling Global Structure with Local Parts for Object Detection


Title	CoupleNet: Coupling Global Structure with Local Parts for Object Detection
Authors	Yousong Zhu, Chaoyang Zhao, Jinqiao Wang, Xu Zhao, Yi Wu, Hanqing Lu
Abstract	The region-based Convolutional Neural Network (CNN) detectors such as Faster R-CNN or R-FCN have already shown promising results for object detection by combining the region proposal subnetwork and the classification subnetwork together. Although R-FCN has achieved higher detection speed while keeping the detection performance, the global structure information is ignored by the position-sensitive score maps. To fully explore the local and global properties, in this paper, we propose a novel fully convolutional network, named as CoupleNet, to couple the global structure with local parts for object detection. Specifically, the object proposals obtained by the Region Proposal Network (RPN) are fed into the the coupling module which consists of two branches. One branch adopts the position-sensitive RoI (PSRoI) pooling to capture the local part information of the object, while the other employs the RoI pooling to encode the global and context information. Next, we design different coupling strategies and normalization ways to make full use of the complementary advantages between the global and local branches. Extensive experiments demonstrate the effectiveness of our approach. We achieve state-of-the-art results on all three challenging datasets, i.e. a mAP of 82.7% on VOC07, 80.4% on VOC12, and 34.4% on COCO. Codes will be made publicly available.
Tasks	Object Detection
Published	2017-08-09
URL	http://arxiv.org/abs/1708.02863v1
PDF	http://arxiv.org/pdf/1708.02863v1.pdf
PWC	https://paperswithcode.com/paper/couplenet-coupling-global-structure-with
Repo	https://github.com/tshizys/CoupleNet
Framework	caffe2

Contextually Customized Video Summaries via Natural Language


Title	Contextually Customized Video Summaries via Natural Language
Authors	Jinsoo Choi, Tae-Hyun Oh, In So Kweon
Abstract	The best summary of a long video differs among different people due to its highly subjective nature. Even for the same person, the best summary may change with time or mood. In this paper, we introduce the task of generating customized video summaries through simple text. First, we train a deep architecture to effectively learn semantic embeddings of video frames by leveraging the abundance of image-caption data via a progressive and residual manner. Given a user-specific text description, our algorithm is able to select semantically relevant video segments and produce a temporally aligned video summary. In order to evaluate our textually customized video summaries, we conduct experimental comparison with baseline methods that utilize ground-truth information. Despite the challenging baselines, our method still manages to show comparable or even exceeding performance. We also show that our method is able to generate semantically diverse video summaries by only utilizing the learned visual embeddings.
Tasks
Published	2017-02-06
URL	http://arxiv.org/abs/1702.01528v3
PDF	http://arxiv.org/pdf/1702.01528v3.pdf
PWC	https://paperswithcode.com/paper/contextually-customized-video-summaries-via
Repo	https://github.com/jinsc37/Page-VidSumm
Framework	none