January 27, 2020

3245 words 16 mins read

Paper Group ANR 1196

GA-GAN: CT reconstruction from Biplanar DRRs using GAN with Guided Attention. Smart Ternary Quantization. Separable Convolutional Eigen-Filters (SCEF): Building Efficient CNNs Using Redundancy Analysis. Improving Catheter Segmentation & Localization in 3D Cardiac Ultrasound Using Direction-Fused FCN. PCONV: The Missing but Desirable Sparsity in DNN …

GA-GAN: CT reconstruction from Biplanar DRRs using GAN with Guided Attention


Title	GA-GAN: CT reconstruction from Biplanar DRRs using GAN with Guided Attention
Authors	Ashish Sinha, Yohei Sugawara, Yuichiro Hirano
Abstract	This work investigates the use of guided attention in the reconstruction of CTvolumes from biplanar DRRs. We try to improve the visual image quality of the CT reconstruction using Guided Attention based GANs (GA-GAN). We also consider the use of Vector Quantization (VQ) for the CT reconstruction so that the memory usage can be reduced, maintaining the same visual image quality. To the best of our knowledge no work has been done before that explores the Vector Quantization for this purpose. Although our findings show that our approaches outperform the previous works, still there is a lot of room for improvement.
Tasks	Quantization
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12525v2
PDF	https://arxiv.org/pdf/1909.12525v2.pdf
PWC	https://paperswithcode.com/paper/ga-gan-ct-reconstruction-from-biplanar-drrs
Repo
Framework

Smart Ternary Quantization


Title	Smart Ternary Quantization
Authors	Grégoire Morin, Ryan Razani, Vahid Partovi Nia, Eyyüb Sari
Abstract	Neural network models are resource hungry. Low bit quantization such as binary and ternary quantization is a common approach to alleviate this resource requirements. Ternary quantization provides a more flexible model and often beats binary quantization in terms of accuracy, but doubles memory and increases computation cost. Mixed quantization depth models, on another hand, allows a trade-off between accuracy and memory footprint. In such models, quantization depth is often chosen manually (which is a tiring task), or is tuned using a separate optimization routine (which requires training a quantized network multiple times). Here, we propose Smart Ternary Quantization (STQ) in which we modify the quantization depth directly through an adaptive regularization function, so that we train a model only once. This method jumps between binary and ternary quantization while training. We show its application on image classification.
Tasks	Image Classification, Quantization
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12205v1
PDF	https://arxiv.org/pdf/1909.12205v1.pdf
PWC	https://paperswithcode.com/paper/smart-ternary-quantization-1
Repo
Framework

Separable Convolutional Eigen-Filters (SCEF): Building Efficient CNNs Using Redundancy Analysis


Title	Separable Convolutional Eigen-Filters (SCEF): Building Efficient CNNs Using Redundancy Analysis
Authors	Samuel Scheidegger, Yinan Yu, Tomas McKelvey
Abstract	Deep Convolutional Neural Networks (CNNs) have been widely used in computer vision due to its effectiveness. While the high model complexity of CNN enables remarkable learning capacity, the large number of trainable parameters comes with a high cost. In addition to the demand of a large amount of resources, the high complexity of the network can result in a high variance in its generalization performance from a statistical learning theory perspective. One way to reduce the complexity of a network without sacrificing its accuracy is to define and identify redundancies in order to remove them. In this work, we propose a method to observe and analyze redundancies in the weights of 2D convolutional (Conv2D) filters. From our experiments, we observe that 1) the vectorized Conv2D filters exhibit low rank behaviors; 2) the effective ranks of these filters typically decrease when the network goes deeper, and 3) these effective ranks are converging over training steps. Inspired by these observations, we propose a new layer called Separable Convolutional Eigen-Filters (SCEF) as an alternative parameterization to Conv2D filters. A SCEF layer can be easily implemented using the depthwise separable convolutions trained with our proposed training strategy. In addition to the decreased number of trainable parameters by using SCEF, depthwise separable convolutions are known to be more computationally efficient compared to Conv2D operations, which reduces the runtime FLOPs as well. Experiments are conducted on the CIFAR-10 and ImageNet datasets by replacing the Conv2D layers with SCEF. The results have shown an increased accuracy using about 2/3 of the original parameters and reduce the number of FLOPs to 2/3 of the base net.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09359v2
PDF	https://arxiv.org/pdf/1910.09359v2.pdf
PWC	https://paperswithcode.com/paper/separable-convolutional-eigen-filters-scef
Repo
Framework

Improving Catheter Segmentation & Localization in 3D Cardiac Ultrasound Using Direction-Fused FCN


Title	Improving Catheter Segmentation & Localization in 3D Cardiac Ultrasound Using Direction-Fused FCN
Authors	Hongxu Yang, Caifeng Shan, Alexander F. Kolen, Peter H. N. de With
Abstract	Fast and accurate catheter detection in cardiac catheterization using harmless 3D ultrasound (US) can improve the efficiency and outcome of the intervention. However, the low image quality of US requires extra training for sonographers to localize the catheter. In this paper, we propose a catheter detection method based on a pre-trained VGG network, which exploits 3D information through re-organized cross-sections to segment the catheter by a shared fully convolutional network (FCN), which is called a Direction-Fused FCN (DF-FCN). Based on the segmented image of DF-FCN, the catheter can be localized by model fitting. Our experiments show that the proposed method can successfully detect an ablation catheter in a challenging ex-vivo 3D US dataset, which was collected on the porcine heart. Extensive analysis shows that the proposed method achieves a Dice score of 57.7%, which offers at least an 11.8 % improvement when compared to state-of-the-art instrument detection methods. Due to the improved segmentation performance by the DF-FCN, the catheter can be localized with an error of only 1.4 mm.
Tasks
Published	2019-02-14
URL	http://arxiv.org/abs/1902.05582v1
PDF	http://arxiv.org/pdf/1902.05582v1.pdf
PWC	https://paperswithcode.com/paper/improving-catheter-segmentation-localization
Repo
Framework

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices


Title	PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices
Authors	Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, Yanzhi Wang
Abstract	Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accuracy, but is not hardware friendly; structured, coarse-grained pruning exploits hardware-efficient structures in pruning, but suffers from accuracy drop when the pruning rate is high. In this paper, we introduce PCONV, comprising a new sparsity dimension, – fine-grained pruning patterns inside the coarse-grained structures. PCONV comprises two types of sparsities, Sparse Convolution Patterns (SCP) which is generated from intra-convolution kernel pruning and connectivity sparsity generated from inter-convolution kernel pruning. Essentially, SCP enhances accuracy due to its special vision properties, and connectivity sparsity increases pruning rate while maintaining balanced workload on filter computation. To deploy PCONV, we develop a novel compiler-assisted DNN inference framework and execute PCONV models in real-time without accuracy compromise, which cannot be achieved in prior work. Our experimental results show that, PCONV outperforms three state-of-art end-to-end DNN frameworks, TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 39.2x, 11.4x, and 6.3x, respectively, with no accuracy loss. Mobile devices can achieve real-time inference on large-scale DNNs.
Tasks	Model Compression
Published	2019-09-06
URL	https://arxiv.org/abs/1909.05073v4
PDF	https://arxiv.org/pdf/1909.05073v4.pdf
PWC	https://paperswithcode.com/paper/pconv-the-missing-but-desirable-sparsity-in
Repo
Framework

Context awareness and embedding for biomedical event extraction


Title	Context awareness and embedding for biomedical event extraction
Authors	Shankai Yan, Ka-Chun Wong
Abstract	Motivation: Biomedical event detection is fundamental for information extraction in molecular biology and biomedical research. The detected events form the central basis for comprehensive biomedical knowledge fusion, facilitating the digestion of massive information influx from literature. Limited by the feature context, the existing event detection models are mostly applicable for a single task. A general and scalable computational model is desiderated for biomedical knowledge management. Results: We consider and propose a bottom-up detection framework to identify the events from recognized arguments. To capture the relations between the arguments, we trained a bi-directional Long Short-Term Memory (LSTM) network to model their context embedding. Leveraging the compositional attributes, we further derived the candidate samples for training event classifiers. We built our models on the datasets from BioNLP Shared Task for evaluations. Our method achieved the average F-scores of 0.81 and 0.92 on BioNLPST-BGI and BioNLPST-BB datasets respectively. Comparing with 7 state-of-the-art methods, our method nearly doubled the existing F-score performance (0.92 vs 0.56) on the BioNLPST-BB dataset. Case studies were conducted to reveal the underlying reasons. Availability: https://github.com/cskyan/evntextrc
Tasks
Published	2019-05-02
URL	https://arxiv.org/abs/1905.00982v1
PDF	https://arxiv.org/pdf/1905.00982v1.pdf
PWC	https://paperswithcode.com/paper/context-awareness-and-embedding-for
Repo
Framework

EmotionX-HSU: Adopting Pre-trained BERT for Emotion Classification


Title	EmotionX-HSU: Adopting Pre-trained BERT for Emotion Classification
Authors	Linkai Luo, Yue Wang
Abstract	This paper describes our approach to the EmotionX-2019, the shared task of SocialNLP 2019. To detect emotion for each utterance of two datasets from the TV show Friends and Facebook chat log EmotionPush, we propose two-step deep learning based methodology: (i) encode each of the utterance into a sequence of vectors that represent its meaning; and (ii) use a simply softmax classifier to predict one of the emotions amongst four candidates that an utterance may carry. Notice that the source of labeled utterances is not rich, we utilise a well-trained model, known as BERT, to transfer part of the knowledge learned from a large amount of corpus to our model. We then focus on fine-tuning our model until it well fits to the in-domain data. The performance of the proposed model is evaluated by micro-F1 scores, i.e., 79.1% and 86.2% for the testsets of Friends and EmotionPush, respectively. Our model ranks 3rd among 11 submissions.
Tasks	Emotion Classification
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09669v1
PDF	https://arxiv.org/pdf/1907.09669v1.pdf
PWC	https://paperswithcode.com/paper/emotionx-hsu-adopting-pre-trained-bert-for
Repo
Framework

A Framework for Deep Constrained Clustering – Algorithms and Advances


Title	A Framework for Deep Constrained Clustering – Algorithms and Advances
Authors	Hongjing Zhang, Sugato Basu, Ian Davidson
Abstract	The area of constrained clustering has been extensively explored by researchers and used by practitioners. Constrained clustering formulations exist for popular algorithms such as k-means, mixture models, and spectral clustering but have several limitations. A fundamental strength of deep learning is its flexibility, and here we explore a deep learning framework for constrained clustering and in particular explore how it can extend the field of constrained clustering. We show that our framework can not only handle standard together/apart constraints (without the well documented negative effects reported earlier) generated from labeled side information but more complex constraints generated from new types of side information such as continuous values and high-level domain knowledge.
Tasks
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10061v3
PDF	https://arxiv.org/pdf/1901.10061v3.pdf
PWC	https://paperswithcode.com/paper/deep-constrained-clustering-algorithms-and
Repo
Framework

D$^2$-City: A Large-Scale Dashcam Video Dataset of Diverse Traffic Scenarios


Title	D$^2$-City: A Large-Scale Dashcam Video Dataset of Diverse Traffic Scenarios
Authors	Zhengping Che, Guangyu Li, Tracy Li, Bo Jiang, Xuefeng Shi, Xinsheng Zhang, Ying Lu, Guobin Wu, Yan Liu, Jieping Ye
Abstract	Driving datasets accelerate the development of intelligent driving and related computer vision technologies, while substantial and detailed annotations serve as fuels and powers to boost the efficacy of such datasets to improve learning-based models. We propose D$^2$-City, a large-scale comprehensive collection of dashcam videos collected by vehicles on DiDi’s platform. D$^2$-City contains more than 10000 video clips which deeply reflect the diversity and complexity of real-world traffic scenarios in China. We also provide bounding boxes and tracking annotations of 12 classes of objects in all frames of 1000 videos and detection annotations on keyframes for the remainder of the videos. Compared with existing datasets, D$^2$-City features data in varying weather, road, and traffic conditions and a huge amount of elaborate detection and tracking annotations. By bringing a diverse set of challenging cases to the community, we expect the D$^2$-City dataset will advance the perception and related areas of intelligent driving.
Tasks
Published	2019-04-03
URL	https://arxiv.org/abs/1904.01975v2
PDF	https://arxiv.org/pdf/1904.01975v2.pdf
PWC	https://paperswithcode.com/paper/d2-city-a-large-scale-dashcam-video-dataset
Repo
Framework

Data Amplification: Instance-Optimal Property Estimation


Title	Data Amplification: Instance-Optimal Property Estimation
Authors	Yi Hao, Alon Orlitsky
Abstract	The best-known and most commonly used distribution-property estimation technique uses a plug-in estimator, with empirical frequency replacing the underlying distribution. We present novel linear-time-computable estimators that significantly “amplify” the effective amount of data available. For a large variety of distribution properties including four of the most popular ones and for every underlying distribution, they achieve the accuracy that the empirical-frequency plug-in estimators would attain using a logarithmic-factor more samples. Specifically, for Shannon entropy and a very broad class of properties including $\ell_1$-distance, the new estimators use $n$ samples to achieve the accuracy attained by the empirical estimators with $n\log n$ samples. For support-size and coverage, the new estimators use $n$ samples to achieve the performance of empirical frequency with sample size $n$ times the logarithm of the property value. Significantly strengthening the traditional min-max formulation, these results hold not only for the worst distributions, but for each and every underlying distribution. Furthermore, the logarithmic amplification factors are optimal. Experiments on a wide variety of distributions show that the new estimators outperform the previous state-of-the-art estimators designed for each specific property.
Tasks
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01432v2
PDF	http://arxiv.org/pdf/1903.01432v2.pdf
PWC	https://paperswithcode.com/paper/data-amplification-instance-optimal-property
Repo
Framework

The Usual Suspects? Reassessing Blame for VAE Posterior Collapse


Title	The Usual Suspects? Reassessing Blame for VAE Posterior Collapse
Authors	Bin Dai, Ziyu Wang, David Wipf
Abstract	In narrow asymptotic settings Gaussian VAE models of continuous data have been shown to possess global optima aligned with ground-truth distributions. Even so, it is well known that poor solutions whereby the latent posterior collapses to an uninformative prior are sometimes obtained in practice. However, contrary to conventional wisdom that largely assigns blame for this phenomena on the undue influence of KL-divergence regularization, we will argue that posterior collapse is, at least in part, a direct consequence of bad local minima inherent to the loss surface of deep autoencoder networks. In particular, we prove that even small nonlinear perturbations of affine VAE decoder models can produce such minima, and in deeper models, analogous minima can force the VAE to behave like an aggressive truncation operator, provably discarding information along all latent dimensions in certain circumstances. Regardless, the underlying message here is not meant to undercut valuable existing explanations of posterior collapse, but rather, to refine the discussion and elucidate alternative risk factors that may have been previously underappreciated.
Tasks
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10702v1
PDF	https://arxiv.org/pdf/1912.10702v1.pdf
PWC	https://paperswithcode.com/paper/the-usual-suspects-reassessing-blame-for-vae-1
Repo
Framework

MobiVSR: A Visual Speech Recognition Solution for Mobile Devices


Title	MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
Authors	Nilay Shrivastava, Astitwa Saxena, Yaman Kumar, Rajiv Ratn Shah, Debanjan Mahata, Amanda Stent
Abstract	Visual speech recognition (VSR) is the task of recognizing spoken language from video input only, without any audio. VSR has many applications as an assistive technology, especially if it could be deployed in mobile devices and embedded systems. The need of intensive computational resources and large memory footprint are two of the major obstacles in developing neural network models for VSR in a resource constrained environment. We propose a novel end-to-end deep neural network architecture for word level VSR called MobiVSR with a design parameter that aids in balancing the model’s accuracy and parameter count. We use depthwise-separable 3D convolution for the first time in the domain of VSR and show how it makes our model efficient. MobiVSR achieves an accuracy of 73% on a challenging Lip Reading in the Wild dataset with 6 times fewer parameters and 20 times lesser memory footprint than the current state of the art. MobiVSR can also be compressed to 6 MB by applying post training quantization.
Tasks	Quantization, Speech Recognition, Visual Speech Recognition
Published	2019-05-10
URL	https://arxiv.org/abs/1905.03968v3
PDF	https://arxiv.org/pdf/1905.03968v3.pdf
PWC	https://paperswithcode.com/paper/mobivsr-a-visual-speech-recognition-solution
Repo
Framework

No-PASt-BO: Normalized Portfolio Allocation Strategy for Bayesian Optimization


Title	No-PASt-BO: Normalized Portfolio Allocation Strategy for Bayesian Optimization
Authors	Thiago de P. Vasconcelos, Daniel A. R. M. A. de Souza, César L. C. Mattos, João P. P. Gomes
Abstract	Bayesian Optimization (BO) is a framework for black-box optimization that is especially suitable for expensive cost functions. Among the main parts of a BO algorithm, the acquisition function is of fundamental importance, since it guides the optimization algorithm by translating the uncertainty of the regression model in a utility measure for each point to be evaluated. Considering such aspect, selection and design of acquisition functions are one of the most popular research topics in BO. Since no single acquisition function was proved to have better performance in all tasks, a well-established approach consists of selecting different acquisition functions along the iterations of a BO execution. In such an approach, the GP-Hedge algorithm is a widely used option given its simplicity and good performance. Despite its success in various applications, GP-Hedge shows an undesirable characteristic of accounting on all past performance measures of each acquisition function to select the next function to be used. In this case, good or bad values obtained in an initial iteration may impact the choice of the acquisition function for the rest of the algorithm. This fact may induce a dominant behavior of an acquisition function and impact the final performance of the method. Aiming to overcome such limitation, in this work we propose a variant of GP-Hedge, named No-PASt-BO, that reduce the influence of far past evaluations. Moreover, our method presents a built-in normalization that avoids the functions in the portfolio to have similar probabilities, thus improving the exploration. The obtained results on both synthetic and real-world optimization tasks indicate that No-PASt-BO presents competitive performance and always outperforms GP-Hedge.
Tasks
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00361v1
PDF	https://arxiv.org/pdf/1908.00361v1.pdf
PWC	https://paperswithcode.com/paper/no-past-bo-normalized-portfolio-allocation
Repo
Framework

PDE-Inspired Algorithms for Semi-Supervised Learning on Point Clouds


Title	PDE-Inspired Algorithms for Semi-Supervised Learning on Point Clouds
Authors	Oliver M. Crook, Tim Hurst, Carola-Bibiane Schönlieb, Matthew Thorpe, Konstantinos C. Zygalakis
Abstract	Given a data set and a subset of labels the problem of semi-supervised learning on point clouds is to extend the labels to the entire data set. In this paper we extend the labels by minimising the constrained discrete $p$-Dirichlet energy. Under suitable conditions the discrete problem can be connected, in the large data limit, with the minimiser of a weighted continuum $p$-Dirichlet energy with the same constraints. We take advantage of this connection by designing numerical schemes that first estimate the density of the data and then apply PDE methods, such as pseudo-spectral methods, to solve the corresponding Euler-Lagrange equation. We prove that our scheme is consistent in the large data limit for two methods of density estimation: kernel density estimation and spline kernel density estimation.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10221v1
PDF	https://arxiv.org/pdf/1909.10221v1.pdf
PWC	https://paperswithcode.com/paper/pde-inspired-algorithms-for-semi-supervised
Repo
Framework

Deep Learning for Bug-Localization in Student Programs


Title	Deep Learning for Bug-Localization in Student Programs
Authors	Rahul Gupta, Aditya Kanade, Shirish Shevade
Abstract	Providing feedback is an integral part of teaching. Most open online courses on programming make use of automated grading systems to support programming assignments and give real-time feedback. These systems usually rely on test results to quantify the programs’ functional correctness. They return failing tests to the students as feedback. However, students may find it difficult to debug their programs if they receive no hints about where the bug is and how to fix it. In this work, we present the first deep learning based technique that can localize bugs in a faulty program w.r.t. a failing test, without even running the program. At the heart of our technique is a novel tree convolutional neural network which is trained to predict whether a program passes or fails a given test. To localize the bugs, we analyze the trained network using a state-of-the-art neural prediction attribution technique and see which lines of the programs make it predict the test outcomes. Our experiments show that the proposed technique is generally more accurate than two state-of-the-art program-spectrum based and one syntactic difference based bug-localization baselines.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12454v1
PDF	https://arxiv.org/pdf/1905.12454v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-bug-localization-in-student
Repo
Framework