January 31, 2020

3137 words 15 mins read

Paper Group AWR 444

Deep Integro-Difference Equation Models for Spatio-Temporal Forecasting. Towards Debiasing Fact Verification Models. Learning a Spatio-Temporal Embedding for Video Instance Segmentation. Universally Slimmable Networks and Improved Training Techniques. Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth. …

Deep Integro-Difference Equation Models for Spatio-Temporal Forecasting


Title	Deep Integro-Difference Equation Models for Spatio-Temporal Forecasting
Authors	Andrew Zammit-Mangion, Christopher K. Wikle
Abstract	Integro-difference equation (IDE) models describe the conditional dependence between the spatial process at a future time point and the process at the present time point through an integral operator. Nonlinearity or temporal dependence in the dynamics is often captured by allowing the operator parameters to vary temporally, or by re-fitting a model with a temporally-invariant linear operator in a sliding window. Both procedures tend to be excellent for prediction purposes over small time horizons, but are generally time-consuming and, crucially, do not provide a global prior model for the temporally-varying dynamics that is realistic. Here, we tackle these two issues by using a deep convolution neural network (CNN) in a hierarchical statistical IDE framework, where the CNN is designed to extract process dynamics from the process’ most recent behaviour. Once the CNN is fitted, probabilistic forecasting can be done extremely quickly online using an ensemble Kalman filter with no requirement for repeated parameter estimation. We conduct an experiment where we train the model using 13 years of daily sea-surface temperature data in the North Atlantic Ocean. Forecasts are seen to be accurate and calibrated. A key advantage of our approach is that the CNN provides a global prior model for the dynamics that is realistic, interpretable, and computationally efficient. We show the versatility of the approach by successfully producing 10-minute nowcasts of weather radar reflectivities in Sydney using the same model that was trained on daily sea-surface temperature data in the North Atlantic Ocean.
Tasks	Spatio-Temporal Forecasting
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13524v3
PDF	https://arxiv.org/pdf/1910.13524v3.pdf
PWC	https://paperswithcode.com/paper/deep-integro-difference-equation-models-for
Repo	https://github.com/andrewzm/deepIDE
Framework	tf

Towards Debiasing Fact Verification Models


Title	Towards Debiasing Fact Verification Models
Authors	Tal Schuster, Darsh J Shah, Yun Jie Serene Yeo, Daniel Filizzola, Enrico Santus, Regina Barzilay
Abstract	Fact verification requires validating a claim in the context of evidence. We show, however, that in the popular FEVER dataset this might not necessarily be the case. Claim-only classifiers perform competitively with top evidence-aware models. In this paper, we investigate the cause of this phenomenon, identifying strong cues for predicting labels solely based on the claim, without considering any evidence. We create an evaluation set that avoids those idiosyncrasies. The performance of FEVER-trained models significantly drops when evaluated on this test set. Therefore, we introduce a regularization method which alleviates the effect of bias in the training data, obtaining improvements on the newly created test set. This work is a step towards a more sound evaluation of reasoning capabilities in fact verification models.
Tasks
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05267v2
PDF	https://arxiv.org/pdf/1908.05267v2.pdf
PWC	https://paperswithcode.com/paper/towards-debiasing-fact-verification-models
Repo	https://github.com/TalSchuster/FeverSymmetric
Framework	pytorch

Learning a Spatio-Temporal Embedding for Video Instance Segmentation


Title	Learning a Spatio-Temporal Embedding for Video Instance Segmentation
Authors	Anthony Hu, Alex Kendall, Roberto Cipolla
Abstract	We present a novel embedding approach for video instance segmentation. Our method learns a spatio-temporal embedding integrating cues from appearance, motion, and geometry; a 3D causal convolutional network models motion, and a monocular self-supervised depth loss models geometry. In this embedding space, video-pixels of the same instance are clustered together while being separated from other instances, to naturally track instances over time without any complex post-processing. Our network runs in real-time as our architecture is entirely causal - we do not incorporate information from future frames, contrary to previous methods. We show that our model can accurately track and segment instances, even with occlusions and missed detections, advancing the state-of-the-art on the KITTI Multi-Object and Tracking Dataset.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2019-12-19
URL	https://arxiv.org/abs/1912.08969v1
PDF	https://arxiv.org/pdf/1912.08969v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-spatio-temporal-embedding-for-1
Repo	https://github.com/jiawen9611/Awesome-Video-Instance-Segmentation
Framework	pytorch

Universally Slimmable Networks and Improved Training Techniques


Title	Universally Slimmable Networks and Improved Training Techniques
Authors	Jiahui Yu, Thomas Huang
Abstract	Slimmable networks are a family of neural networks that can instantly adjust the runtime width. The width can be chosen from a predefined widths set to adaptively optimize accuracy-efficiency trade-offs at runtime. In this work, we propose a systematic approach to train universally slimmable networks (US-Nets), extending slimmable networks to execute at arbitrary width, and generalizing to networks both with and without batch normalization layers. We further propose two improved training techniques for US-Nets, named the sandwich rule and inplace distillation, to enhance training process and boost testing accuracy. We show improved performance of universally slimmable MobileNet v1 and MobileNet v2 on ImageNet classification task, compared with individually trained ones and 4-switch slimmable network baselines. We also evaluate the proposed US-Nets and improved training techniques on tasks of image super-resolution and deep reinforcement learning. Extensive ablation experiments on these representative tasks demonstrate the effectiveness of our proposed methods. Our discovery opens up the possibility to directly evaluate FLOPs-Accuracy spectrum of network architectures. Code and models are available at: https://github.com/JiahuiYu/slimmable_networks
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-03-12
URL	https://arxiv.org/abs/1903.05134v2
PDF	https://arxiv.org/pdf/1903.05134v2.pdf
PWC	https://paperswithcode.com/paper/universally-slimmable-networks-and-improved
Repo	https://github.com/JiahuiYu/slimmable_networks
Framework	pytorch

Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth


Title	Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth
Authors	Rui Wang, Stephen M. Pizer, Jan-Michael Frahm
Abstract	Deep learning-based, single-view depth estimation methods have recently shown highly promising results. However, such methods ignore one of the most important features for determining depth in the human vision system, which is motion. We propose a learning-based, multi-view dense depth map and odometry estimation method that uses Recurrent Neural Networks (RNN) and trains utilizing multi-view image reprojection and forward-backward flow-consistency losses. Our model can be trained in a supervised or even unsupervised mode. It is designed for depth and visual odometry estimation from video where the input frames are temporally correlated. However, it also generalizes to single-view depth estimation. Our method produces superior results to the state-of-the-art approaches for single-view and multi-view learning-based depth estimation on the KITTI driving dataset.
Tasks	Depth Estimation, MULTI-VIEW LEARNING, Visual Odometry
Published	2019-04-15
URL	http://arxiv.org/abs/1904.07087v1
PDF	http://arxiv.org/pdf/1904.07087v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-network-for-un-supervised
Repo	https://github.com/wrlife/RNN_depth_pose
Framework	tf

Photorealistic Style Transfer via Wavelet Transforms


Title	Photorealistic Style Transfer via Wavelet Transforms
Authors	Jaejun Yoo, Youngjung Uh, Sanghyuk Chun, Byeongkyu Kang, Jung-Woo Ha
Abstract	Recent style transfer models have provided promising artistic results. However, given a photograph as a reference style, existing methods are limited by spatial distortions or unrealistic artifacts, which should not happen in real photographs. We introduce a theoretically sound correction to the network architecture that remarkably enhances photorealism and faithfully transfers the style. The key ingredient of our method is wavelet transforms that naturally fits in deep networks. We propose a wavelet corrected transfer based on whitening and coloring transforms (WCT$^2$) that allows features to preserve their structural information and statistical properties of VGG feature space during stylization. This is the first and the only end-to-end model that can stylize a $1024\times1024$ resolution image in 4.7 seconds, giving a pleasing and photorealistic quality without any post-processing. Last but not least, our model provides a stable video stylization without temporal constraints. Our code, generated images, and pre-trained models are all available at https://github.com/ClovaAI/WCT2.
Tasks	Style Transfer
Published	2019-03-23
URL	https://arxiv.org/abs/1903.09760v2
PDF	https://arxiv.org/pdf/1903.09760v2.pdf
PWC	https://paperswithcode.com/paper/photorealistic-style-transfer-via-wavelet
Repo	https://github.com/leolle/StyleTransfer
Framework	pytorch

Word2vec to behavior: morphology facilitates the grounding of language in machines


Title	Word2vec to behavior: morphology facilitates the grounding of language in machines
Authors	David Matthews, Sam Kriegman, Collin Cappelle, Josh Bongard
Abstract	Enabling machines to respond appropriately to natural language commands could greatly expand the number of people to whom they could be of service. Recently, advances in neural network-trained word embeddings have empowered non-embodied text-processing algorithms, and suggest they could be of similar utility for embodied machines. Here we introduce a method that does so by training robots to act similarly to semantically-similar word2vec encoded commands. We show that this enables them to act appropriately, after training, to previously-unheard commands. Finally, we show that inducing such an alignment between motoric and linguistic similarities can be facilitated or hindered by the mechanical structure of the robot. This points to future, large scale methods that find and exploit relationships between action, language, and robot structure.
Tasks	Word Embeddings
Published	2019-08-03
URL	https://arxiv.org/abs/1908.01211v1
PDF	https://arxiv.org/pdf/1908.01211v1.pdf
PWC	https://paperswithcode.com/paper/word2vec-to-behavior-morphology-facilitates
Repo	https://github.com/davidmatthews1uvm/2019-IROS
Framework	none

Model selection for contextual bandits


Title	Model selection for contextual bandits
Authors	Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo
Abstract	We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation. Our main result is a new model selection guarantee for linear contextual bandits. We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}(T^{2/3}d^{1/3}{m^\star})$ regret with no prior knowledge of the optimal dimension $d{m^\star}$. The algorithm also achieves regret $\tilde{O}(T^{3/4} + \sqrt{Td_{m^\star}})$, which is optimal for $d_{m^{\star}}\geq{}\sqrt{T}$. This is the first model selection result for contextual bandits with non-vacuous regret for all values of $d_{m^\star}$, and to the best of our knowledge is the first positive result of this type for any online learning setting with partial information. The core of the algorithm is a new estimator for the gap in the best loss achievable by two linear policy classes, which we show admits a convergence rate faster than the rate required to learn the parameters for either class.
Tasks	Model Selection, Multi-Armed Bandits
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00531v3
PDF	https://arxiv.org/pdf/1906.00531v3.pdf
PWC	https://paperswithcode.com/paper/190600531
Repo	https://github.com/akshaykr/oracle_cb
Framework	none

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video


Title	Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video
Authors	Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid
Abstract	Recent work has shown that CNN-based depth and ego-motion estimators can be learned using unlabelled monocular videos. However, the performance is limited by unidentified moving objects that violate the underlying static scene assumption in geometric image reconstruction. More significantly, due to lack of proper constraints, networks output scale-inconsistent results over different samples, i.e., the ego-motion network cannot provide full camera trajectories over a long video sequence because of the per-frame scale ambiguity. This paper tackles these challenges by proposing a geometry consistency loss for scale-consistent predictions and an induced self-discovered mask for handling moving objects and occlusions. Since we do not leverage multi-task learning like recent works, our framework is much simpler and more efficient. Comprehensive evaluation results demonstrate that our depth estimator achieves the state-of-the-art performance on the KITTI dataset. Moreover, we show that our ego-motion network is able to predict a globally scale-consistent camera trajectory for long video sequences, and the resulting visual odometry accuracy is competitive with the recent model that is trained using stereo videos. To the best of our knowledge, this is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale-consistent camera trajectories over a long video sequence.
Tasks	Depth And Camera Motion, Depth Estimation, Monocular Depth Estimation, Visual Odometry
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10553v2
PDF	https://arxiv.org/pdf/1908.10553v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-scale-consistent-depth-and-ego
Repo	https://github.com/TopGun666/SC-SfMLearner-Release
Framework	pytorch

WPU-Net: Boundary Learning by Using Weighted Propagation in Convolution Network


Title	WPU-Net: Boundary Learning by Using Weighted Propagation in Convolution Network
Authors	Boyuan Ma, Chuni Liu, Xiaojuan Ban, Hao Wang, Weihua Xue, Haiyou Huang
Abstract	Deep learning has driven a great progress in natural and biological image processing. However, in material science and engineering, there are often some flaws and indistinctions in material microscopic images induced from complex sample preparation, even due to the material itself, hindering the detection of target objects. In this work, we propose WPU-net that redesigns the architecture and weighted loss of U-Net, which forces the network to integrate information from adjacent slices and pays more attention to the topology in boundary detection task. Then, the WPU-net is applied into a typical material example, i.e., the grain boundary detection of polycrystalline material. Experiments demonstrate that the proposed method achieves promising performance and outperforms state-of-the-art methods. Besides, we propose a new method for object tracking between adjacent slices, which can effectively reconstruct 3D structure of the whole material. Finally, we present a material microscopic image dataset with the goal of advancing the state-of-the-art in image processing for material science.
Tasks	Boundary Detection, Object Tracking
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09226v2
PDF	https://arxiv.org/pdf/1905.09226v2.pdf
PWC	https://paperswithcode.com/paper/wpu-netboundary-learning-by-using-weighted
Repo	https://github.com/clovermini/WPU-Net
Framework	pytorch

Expected Tight Bounds for Robust Deep Neural Network Training


Title	Expected Tight Bounds for Robust Deep Neural Network Training
Authors	Salman Alsubaihi, Adel Bibi, Modar Alfadly, Abdullah Hamdi, Bernard Ghanem
Abstract	Training Deep Neural Networks (DNNs) that are robust to norm bounded adversarial attacks remains an elusive problem. While verification based methods are generally too expensive to robustly train large networks, it was demonstrated in Gowal et al that bounded input intervals can be inexpensively propagated per layer through large networks. This interval bound propagation (IBP) approach led to high robustness and was the first to be employed on large networks. However, due to the very loose nature of the IBP bounds, particularly for large networks, the required training procedure is complex and involved. In this paper, we closely examine the bounds of a block of layers composed of an affine layer followed by a ReLU nonlinearity followed by another affine layer. To this end, we propose expected bounds, true bounds in expectation, that are provably tighter than IBP bounds in expectation. We then extend this result to deeper networks through blockwise propagation and show that we can achieve orders of magnitudes tighter bounds compared to IBP. With such tight bounds, we demonstrate that a simple standard training procedure can achieve the best robustness-accuracy trade-off across several architectures on both MNIST and CIFAR10.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12418v4
PDF	https://arxiv.org/pdf/1905.12418v4.pdf
PWC	https://paperswithcode.com/paper/probabilistically-true-and-tight-bounds-for
Repo	https://github.com/ModarTensai/ptb-neurips19
Framework	pytorch

LIP: Local Importance-based Pooling


Title	LIP: Local Importance-based Pooling
Authors	Ziteng Gao, Limin Wang, Gangshan Wu
Abstract	Spatial downsampling layers are favored in convolutional neural networks (CNNs) to downscale feature maps for larger receptive fields and less memory consumption. However, for discriminative tasks, there is a possibility that these layers lose the discriminative details due to improper pooling strategies, which could hinder the learning process and eventually result in suboptimal models. In this paper, we present a unified framework over the existing downsampling layers (e.g., average pooling, max pooling, and strided convolution) from a local importance view. In this framework, we analyze the issues of these widely-used pooling layers and figure out the criteria for designing an effective downsampling layer. According to this analysis, we propose a conceptually simple, general, and effective pooling layer based on local importance modeling, termed as {\em Local Importance-based Pooling} (LIP). LIP can automatically enhance discriminative features during the downsampling procedure by learning adaptive importance weights based on inputs. Experiment results show that LIP consistently yields notable gains with different depths and different architectures on ImageNet classification. In the challenging MS COCO dataset, detectors with our LIP-ResNets as backbones obtain a consistent improvement ($\ge 1.4%$) over the vanilla ResNets, and especially achieve the current state-of-the-art performance in detecting small objects under the single-scale testing scheme.
Tasks	Image Classification, Object Detection
Published	2019-08-12
URL	https://arxiv.org/abs/1908.04156v3
PDF	https://arxiv.org/pdf/1908.04156v3.pdf
PWC	https://paperswithcode.com/paper/lip-local-importance-based-pooling
Repo	https://github.com/sebgao/LIP
Framework	pytorch

Real-time Vision-based Depth Reconstruction with NVidia Jetson


Title	Real-time Vision-based Depth Reconstruction with NVidia Jetson
Authors	Andrey Bokovoy, Kirill Muravyev, Konstantin Yakovlev
Abstract	Vision-based depth reconstruction is a challenging problem extensively studied in computer vision but still lacking universal solution. Reconstructing depth from single image is particularly valuable to mobile robotics as it can be embedded to the modern vision-based simultaneous localization and mapping (vSLAM) methods providing them with the metric information needed to construct accurate maps in real scale. Typically, depth reconstruction is done nowadays via fully-convolutional neural networks (FCNNs). In this work we experiment with several FCNN architectures and introduce a few enhancements aimed at increasing both the effectiveness and the efficiency of the inference. We experimentally determine the solution that provides the best performance/accuracy tradeoff and is able to run on NVidia Jetson with the framerates exceeding 16FPS for 320 x 240 input. We also evaluate the suggested models by conducting monocular vSLAM of unknown indoor environment on NVidia Jetson TX2 in real-time. Open-source implementation of the models and the inference node for Robot Operating System (ROS) are available at https://github.com/CnnDepth/tx2_fcnn_node.
Tasks	Simultaneous Localization and Mapping
Published	2019-07-16
URL	https://arxiv.org/abs/1907.07210v1
PDF	https://arxiv.org/pdf/1907.07210v1.pdf
PWC	https://paperswithcode.com/paper/real-time-vision-based-depth-reconstruction
Repo	https://github.com/CnnDepth/tx2_fcnn_node
Framework	tf

A Prototypical Triplet Loss for Cover Detection


Title	A Prototypical Triplet Loss for Cover Detection
Authors	Guillaume Doras, Geoffroy Peeters
Abstract	Automatic cover detection – the task of finding in a audio dataset all covers of a query track – has long been a challenging theoretical problem in MIR community. It also became a practical need for music composers societies requiring to detect automatically if an audio excerpt embeds musical content belonging to their catalog. In a recent work, we addressed this problem with a convolutional neural network mapping each track’s dominant melody to an embedding vector, and trained to minimize cover pairs distance in the embeddings space, while maximizing it for non-covers. We showed in particular that training this model with enough works having five or more covers yields state-of-the-art results. This however does not reflect the realistic use case, where music catalogs typically contain works with zero or at most one or two covers. We thus introduce here a new test set incorporating these constraints, and propose two contributions to improve our model’s accuracy under these stricter conditions: we replace dominant melody with multi-pitch representation as input data, and describe a novel prototypical triplet loss designed to improve covers clustering. We show that these changes improve results significantly for two concrete use cases, large dataset lookup and live songs identification.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09862v1
PDF	https://arxiv.org/pdf/1910.09862v1.pdf
PWC	https://paperswithcode.com/paper/a-prototypical-triplet-loss-for-cover
Repo	https://github.com/gdoras/PrototypicalTripletLoss
Framework	tf

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization


Title	PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Authors	Thijs Vogels, Sai Praneeth Karimireddy, Martin Jaggi
Abstract	We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well or fail to achieve the target test accuracy. We propose a new low-rank gradient compressor based on power iteration that can i) compress gradients rapidly, ii) efficiently aggregate the compressed gradients using all-reduce, and iii) achieve test performance on par with SGD. The proposed algorithm is the only method evaluated that achieves consistent wall-clock speedups when benchmarked against regular SGD with an optimized communication backend. We demonstrate reduced training times for convolutional networks as well as LSTMs on common datasets. Our code is available at https://github.com/epfml/powersgd.
Tasks	Distributed Optimization
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13727v3
PDF	https://arxiv.org/pdf/1905.13727v3.pdf
PWC	https://paperswithcode.com/paper/powersgd-practical-low-rank-gradient
Repo	https://github.com/epfml/powersgd
Framework	pytorch