January 27, 2020

3080 words 15 mins read

Paper Group ANR 1194

Outside the Box: Abstraction-Based Monitoring of Neural Networks. Early Bird Catches the Worm: Predicting Returns Even Before Purchase in Fashion E-commerce. Black Box Submodular Maximization: Discrete and Continuous Settings. Salient Instance Segmentation via Subitizing and Clustering. Distilling importance sampling. Variational inference for neur …

Outside the Box: Abstraction-Based Monitoring of Neural Networks


Title	Outside the Box: Abstraction-Based Monitoring of Neural Networks
Authors	Thomas A. Henzinger, Anna Lukina, Christian Schilling
Abstract	Neural networks have demonstrated unmatched performance in a range of classification tasks. Despite numerous efforts of the research community, novelty detection remains one of the significant limitations of neural networks. The ability to identify previously unseen inputs as novel is crucial for our understanding of the decisions made by neural networks. At runtime, inputs not falling into any of the categories learned during training cannot be classified correctly by the neural network. Existing approaches treat the neural network as a black box and try to detect novel inputs based on the confidence of the output predictions. However, neural networks are not trained to reduce their confidence for novel inputs, which limits the effectiveness of these approaches. We propose a framework to monitor a neural network by observing the hidden layers. We employ a common abstraction from program analysis - boxes - to identify novel behaviors in the monitored layers, i.e., inputs that cause behaviors outside the box. For each neuron, the boxes range over the values seen in training. The framework is efficient and flexible to achieve a desired trade-off between raising false warnings and detecting novel inputs. We illustrate the performance and the robustness to variability in the unknown classes on popular image-classification benchmarks.
Tasks	Image Classification
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09032v3
PDF	https://arxiv.org/pdf/1911.09032v3.pdf
PWC	https://paperswithcode.com/paper/outside-the-box-abstraction-based-monitoring
Repo
Framework

Early Bird Catches the Worm: Predicting Returns Even Before Purchase in Fashion E-commerce


Title	Early Bird Catches the Worm: Predicting Returns Even Before Purchase in Fashion E-commerce
Authors	Sajan Kedia, Manchit Madan, Sumit Borar
Abstract	With the rapid growth in fashion e-commerce and customer-friendly product return policies, the cost to handle returned products has become a significant challenge. E-tailers incur huge losses in terms of reverse logistics costs, liquidation cost due to damaged returns or fraudulent behavior. Accurate prediction of product returns prior to order placement can be critical for companies. It can facilitate e-tailers to take preemptive measures even before the order is placed, hence reducing overall returns. Furthermore, finding return probability for millions of customers at the cart page in real-time can be difficult. To address this problem we propose a novel approach based on Deep Neural Network. Users’ taste & products’ latent hidden features were captured using product embeddings based on Bayesian Personalized Ranking (BPR). Another set of embeddings was used which captured users’ body shape and size by using skip-gram based model. The deep neural network incorporates these embeddings along with the engineered features to predict return probability. Using this return probability, several live experiments were conducted on one of the major fashion e-commerce platform in order to reduce overall returns.
Tasks
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12128v1
PDF	https://arxiv.org/pdf/1906.12128v1.pdf
PWC	https://paperswithcode.com/paper/early-bird-catches-the-worm-predicting
Repo
Framework

Black Box Submodular Maximization: Discrete and Continuous Settings


Title	Black Box Submodular Maximization: Discrete and Continuous Settings
Authors	Lin Chen, Mingrui Zhang, Hamed Hassani, Amin Karbasi
Abstract	In this paper, we consider the problem of black box continuous submodular maximization where we only have access to the function values and no information about the derivatives is provided. For a monotone and continuous DR-submodular function, and subject to a bounded convex body constraint, we propose Black-box Continuous Greedy, a derivative-free algorithm that provably achieves the tight $[(1-1/e)OPT-\epsilon]$ approximation guarantee with $O(d/\epsilon^3)$ function evaluations. We then extend our result to the stochastic setting where function values are subject to stochastic zero-mean noise. It is through this stochastic generalization that we revisit the discrete submodular maximization problem and use the multi-linear extension as a bridge between discrete and continuous settings. Finally, we extensively evaluate the performance of our algorithm on continuous and discrete submodular objective functions using both synthetic and real data.
Tasks
Published	2019-01-28
URL	https://arxiv.org/abs/1901.09515v2
PDF	https://arxiv.org/pdf/1901.09515v2.pdf
PWC	https://paperswithcode.com/paper/black-box-submodular-maximization-discrete
Repo
Framework

Salient Instance Segmentation via Subitizing and Clustering


Title	Salient Instance Segmentation via Subitizing and Clustering
Authors	Jialun Pei, He Tang, Chao Liu, Chuanbo Chen
Abstract	The goal of salient region detection is to identify the regions of an image that attract the most attention. Many methods have achieved state-of-the-art performance levels on this task. Recently, salient instance segmentation has become an even more challenging task than traditional salient region detection; however, few of the existing methods have concentrated on this underexplored problem. Unlike the existing methods, which usually employ object proposals to roughly count and locate object instances, our method applies salient objects subitizing to predict an accurate number of instances for salient instance segmentation. In this paper, we propose a multitask densely connected neural network (MDNN) to segment salient instances in an image. In contrast to existing approaches, our framework is proposal-free and category-independent. The MDNN contains two parallel branches: the first is a densely connected subitizing network (DSN) used for subitizing prediction; the second is a densely connected fully convolutional network (DFCN) used for salient region detection. The MDNN simultaneously outputs saliency maps and salient object subitizing. Then, an adaptive deep feature-based spectral clustering operation segments the salient regions into instances based on the subitizing and saliency maps. The experimental results on both salient region detection and salient instance segmentation datasets demonstrate the satisfactory performance of our framework. Notably, its APr@0.5 and Apr@0.7 reaches 73.46% and 60.14% in the salient instance dataset, substantially higher than the results achieved by the state-of-the-art algorithm.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2019-09-29
URL	https://arxiv.org/abs/1909.13240v1
PDF	https://arxiv.org/pdf/1909.13240v1.pdf
PWC	https://paperswithcode.com/paper/salient-instance-segmentation-via-subitizing
Repo
Framework

Distilling importance sampling


Title	Distilling importance sampling
Authors	Dennis Prangle
Abstract	The two main approaches to Bayesian inference are sampling and optimisation methods. However many complicated posteriors are difficult to approximate by either. Therefore we propose a novel approach combining features of both. We use a flexible parameterised family of densities, such as a normalising flow. Given a density from this family approximating the posterior, we use importance sampling to produce a weighted sample from a more accurate posterior approximation. This sample is then used in optimisation to update the parameters of the approximate density, which we view as distilling the importance sampling results. We iterate these steps and gradually improve the quality of the posterior approximation. We illustrate our method in two challenging examples: a queueing model and a stochastic differential equation model.
Tasks	Bayesian Inference
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03632v2
PDF	https://arxiv.org/pdf/1910.03632v2.pdf
PWC	https://paperswithcode.com/paper/distilling-importance-sampling
Repo
Framework

Variational inference for neural network matrix factorization and its application to stochastic blockmodeling


Title	Variational inference for neural network matrix factorization and its application to stochastic blockmodeling
Authors	Onno Kampman, Creighton Heaukulani
Abstract	We consider the probabilistic analogue to neural network matrix factorization (Dziugaite & Roy, 2015), which we construct with Bayesian neural networks and fit with variational inference. We find that a linear model fit with variational inference can attain equivalent predictive performance to the regular neural network variants on the Movielens data sets. We discuss the implications of this result, which include some suggestions on the pros and cons of using the neural network construction, as well as the variational approach to inference. Such a probabilistic approach is required, however, when considering the important class of stochastic block models. We describe a variational inference algorithm for a neural network matrix factorization model with nonparametric block structure and evaluate its performance on the NIPS co-authorship data set.
Tasks
Published	2019-05-11
URL	https://arxiv.org/abs/1905.04502v3
PDF	https://arxiv.org/pdf/1905.04502v3.pdf
PWC	https://paperswithcode.com/paper/variational-inference-for-neural-network
Repo
Framework

On Extracting Data from Tables that are Encoded using HTML


Title	On Extracting Data from Tables that are Encoded using HTML
Authors	Juan C. Roldán, Patricia Jiménez, Rafael Corchuelo
Abstract	Tables are a common means to display data in human-friendly formats. Many authors have worked on proposals to extract those data back since this has many interesting applications. In this article, we summarise and compare many of the proposals to extract data from tables that are encoded using HTML and have been published between $2000$ and $2018$. We first present a vocabulary that homogenises the terminology used in this field; next, we use it to summarise the proposals; finally, we compare them side by side. Our analysis highlights several challenges to which no proposal provides a conclusive solution and a few more that have not been addressed sufficiently; simply put, no proposal provides a complete solution to the problem, which seems to suggest that this research field shall keep active in the near future. We have also realised that there is no consensus regarding the datasets and the methods used to evaluate the proposals, which hampers comparing the experimental results.
Tasks
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08305v2
PDF	http://arxiv.org/pdf/1903.08305v2.pdf
PWC	https://paperswithcode.com/paper/on-extracting-data-from-html-tables
Repo
Framework

Enhanced generative adversarial network for 3D brain MRI super-resolution


Title	Enhanced generative adversarial network for 3D brain MRI super-resolution
Authors	Jiancong Wang, Yuhua Chen, Yifan Wu, Jianbo Shi, James Gee
Abstract	Single image super-resolution (SISR) reconstruction for magnetic resonance imaging (MRI) has generated significant interest because of its potential to not only speed up imaging but to improve quantitative processing and analysis of available image data. Generative Adversarial Networks (GAN) have proven to perform well in recovering image texture detail, and many variants have therefore been proposed for SISR. In this work, we develop an enhancement to tackle GAN-based 3D SISR by introducing a new residual-in-residual dense block (RRDG) generator that is both memory efficient and achieves state-of-the-art performance in terms of PSNR (Peak Signal to Noise Ratio), SSIM (Structural Similarity) and NRMSE (Normalized Root Mean Squared Error) metrics. We also introduce a patch GAN discriminator with improved convergence behavior to better model brain image texture. We proposed a novel the anatomical fidelity evaluation of the results using a pre-trained brain parcellation network. Finally, these developments are combined through a simple and efficient method to balance etween image and texture quality in the final output.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04835v2
PDF	https://arxiv.org/pdf/1907.04835v2.pdf
PWC	https://paperswithcode.com/paper/enhanced-generative-adversarial-network-for
Repo
Framework

Embodiment dictates learnability in neural controllers


Title	Embodiment dictates learnability in neural controllers
Authors	Joshua Powers, Ryan Grindle, Sam Kriegman, Lapo Frati, Nick Cheney, Josh Bongard
Abstract	Catastrophic forgetting continues to severely restrict the learnability of controllers suitable for multiple task environments. Efforts to combat catastrophic forgetting reported in the literature to date have focused on how control systems can be updated more rapidly, hastening their adjustment from good initial settings to new environments, or more circumspectly, suppressing their ability to overfit to any one environment. When using robots, the environment includes the robot’s own body, its shape and material properties, and how its actuators and sensors are distributed along its mechanical structure. Here we demonstrate for the first time how one such design decision (sensor placement) can alter the landscape of the loss function itself, either expanding or shrinking the weight manifolds containing suitable controllers for each individual task, thus increasing or decreasing their probability of overlap across tasks, and thus reducing or inducing the potential for catastrophic forgetting.
Tasks
Published	2019-10-15
URL	https://arxiv.org/abs/1910.07487v1
PDF	https://arxiv.org/pdf/1910.07487v1.pdf
PWC	https://paperswithcode.com/paper/embodiment-dictates-learnability-in-neural
Repo
Framework

Synthetic Elastography using B-mode Ultrasound through a Deep Fully-Convolutional Neural Network


Title	Synthetic Elastography using B-mode Ultrasound through a Deep Fully-Convolutional Neural Network
Authors	R. R. Wildeboer, R. J. G. van Sloun, C. K. Mannaerts, G. Salomon, H. Wijkstra, M. Mischi
Abstract	Shear-wave elastography (SWE) permits local estimation of tissue elasticity, an important imaging marker in biomedicine. This recently-developed, advanced technique assesses the speed of a laterally-travelling shear wave after an acoustic radiation force “push” to estimate local Young’s moduli in an operator-independent fashion. In this work, we show how synthetic SWE (sSWE) images can be generated based on conventional B-mode imaging through deep learning. Using side-by-side-view B-mode/SWE images collected in 50 patients with prostate cancer, we show that sSWE images with a pixel-wise mean absolute error of 4.8 kPa with regard to the original SWE can be generated. Visualization of high-level feature levels through t-Distributed Stochastic Neighbor Embedding reveals a high degree of overlap between data from different scanners. Also qualitatively, sSWE results seem generalisable to single B-mode acquisitions and other scanners. In the future, we envision sSWE as a reliable elasticity-related tissue typing strategy that is solely based on B-mode ultrasound acquisition.
Tasks
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03573v1
PDF	https://arxiv.org/pdf/1908.03573v1.pdf
PWC	https://paperswithcode.com/paper/synthetic-elastography-using-b-mode
Repo
Framework

Shift R-CNN: Deep Monocular 3D Object Detection with Closed-Form Geometric Constraints


Title	Shift R-CNN: Deep Monocular 3D Object Detection with Closed-Form Geometric Constraints
Authors	Andretti Naiden, Vlad Paunescu, Gyeongmo Kim, ByeongMoon Jeon, Marius Leordeanu
Abstract	We propose Shift R-CNN, a hybrid model for monocular 3D object detection, which combines deep learning with the power of geometry. We adapt a Faster R-CNN network for regressing initial 2D and 3D object properties and combine it with a least squares solution for the inverse 2D to 3D geometric mapping problem, using the camera projection matrix. The closed-form solution of the mathematical system, along with the initial output of the adapted Faster R-CNN are then passed through a final ShiftNet network that refines the result using our newly proposed Volume Displacement Loss. Our novel, geometrically constrained deep learning approach to monocular 3D object detection obtains top results on KITTI 3D Object Detection Benchmark, being the best among all monocular methods that do not use any pre-trained network for depth estimation.
Tasks	3D Object Detection, Depth Estimation, Object Detection
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09970v1
PDF	https://arxiv.org/pdf/1905.09970v1.pdf
PWC	https://paperswithcode.com/paper/shift-r-cnn-deep-monocular-3d-object
Repo
Framework

Learning Interpretable Disease Self-Representations for Drug Repositioning


Title	Learning Interpretable Disease Self-Representations for Drug Repositioning
Authors	Fabrizio Frasca, Diego Galeano, Guadalupe Gonzalez, Ivan Laponogov, Kirill Veselkov, Alberto Paccanaro, Michael M. Bronstein
Abstract	Drug repositioning is an attractive cost-efficient strategy for the development of treatments for human diseases. Here, we propose an interpretable model that learns disease self-representations for drug repositioning. Our self-representation model represents each disease as a linear combination of a few other diseases. We enforce proximity in the learnt representations in a way to preserve the geometric structure of the human phenome network - a domain-specific knowledge that naturally adds relational inductive bias to the disease self-representations. We prove that our method is globally optimal and show results outperforming state-of-the-art drug repositioning approaches. We further show that the disease self-representations are biologically interpretable.
Tasks
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06609v2
PDF	https://arxiv.org/pdf/1909.06609v2.pdf
PWC	https://paperswithcode.com/paper/learning-interpretable-disease-self
Repo
Framework

Toward Ergonomic Risk Prediction via Segmentation of Indoor Object Manipulation Actions Using Spatiotemporal Convolutional Networks


Title	Toward Ergonomic Risk Prediction via Segmentation of Indoor Object Manipulation Actions Using Spatiotemporal Convolutional Networks
Authors	Behnoosh Parsa, Ekta U. Samani, Rose Hendrix, Cameron Devine, Shashi M. Singh, Santosh Devasia, Ashis G. Banerjee
Abstract	Automated real-time prediction of the ergonomic risks of manipulating objects is a key unsolved challenge in developing effective human-robot collaboration systems for logistics and manufacturing applications. We present a foundational paradigm to address this challenge by formulating the problem as one of action segmentation from RGB-D camera videos. Spatial features are first learned using a deep convolutional model from the video frames, which are then fed sequentially to temporal convolutional networks to semantically segment the frames into a hierarchy of actions, which are either ergonomically safe, require monitoring, or need immediate attention. For performance evaluation, in addition to an open-source kitchen dataset, we collected a new dataset comprising twenty individuals picking up and placing objects of varying weights to and from cabinet and table locations at various heights. Results show very high (87-94)% F1 overlap scores among the ground truth and predicted frame labels for videos lasting over two minutes and consisting of a large number of actions.
Tasks	action segmentation
Published	2019-02-14
URL	https://arxiv.org/abs/1902.05176v2
PDF	https://arxiv.org/pdf/1902.05176v2.pdf
PWC	https://paperswithcode.com/paper/predicting-ergonomic-risks-during-indoor
Repo
Framework

An End-to-end Video Text Detector with Online Tracking


Title	An End-to-end Video Text Detector with Online Tracking
Authors	Hongyuan Yu, Chengquan Zhang, Xuan Li, Junyu Han, Errui Ding, Liang Wang
Abstract	Video text detection is considered as one of the most difficult tasks in document analysis due to the following two challenges: 1) the difficulties caused by video scenes, i.e., motion blur, illumination changes, and occlusion; 2) the properties of text including variants of fonts, languages, orientations, and shapes. Most existing methods attempt to enhance the performance of video text detection by cooperating with video text tracking, but treat these two tasks separately. In this work, we propose an end-to-end video text detection model with online tracking to address these two challenges. Specifically, in the detection branch, we adopt ConvLSTM to capture spatial structure information and motion memory. In the tracking branch, we convert the tracking problem to text instance association, and an appearance-geometry descriptor with memory mechanism is proposed to generate robust representation of text instances. By integrating these two branches into one trainable framework, they can promote each other and the computational cost is significantly reduced. Experiments on existing video text benchmarks including ICDAR2013 Video, Minetto and YVT demonstrate that the proposed method significantly outperforms state-of-the-art methods. Our method improves F-score by about 2 on all datasets and it can run realtime with 24.36 fps on TITAN Xp.
Tasks
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07135v1
PDF	https://arxiv.org/pdf/1908.07135v1.pdf
PWC	https://paperswithcode.com/paper/an-end-to-end-video-text-detector-with-online
Repo
Framework

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference


Title	AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference
Authors	Thierry Tambe, En-Yu Yang, Zishen Wan, Yuntian Deng, Vijay Janapa Reddi, Alexander Rush, David Brooks, Gu-Yeon Wei
Abstract	Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally clips its available dynamic range, at a layer granularity, in order to create faithful encoding of neural network parameters. AdaptivFloat consistently produces higher inference accuracies compared to block floating-point, uniform, IEEE-like float or posit encodings at very low precision ($\leq$ 8-bit) across a diverse set of state-of-the-art neural network topologies. And notably, AdaptivFloat is seen surpassing baseline FP32 performance by up to +0.3 in BLEU score and -0.75 in word error rate at weight bit widths that are $\leq$ 8-bit. Experimental results on a deep neural network (DNN) hardware accelerator, exploiting AdaptivFloat logic in its computational datapath, demonstrate per-operation energy and area that is 0.9$\times$ and 1.14$\times$, respectively, that of equivalent bit width integer-based accelerator variants.
Tasks	Quantization
Published	2019-09-29
URL	https://arxiv.org/abs/1909.13271v3
PDF	https://arxiv.org/pdf/1909.13271v3.pdf
PWC	https://paperswithcode.com/paper/adaptivfloat-a-floating-point-based-data-type
Repo
Framework