October 17, 2019

3027 words 15 mins read

Paper Group ANR 791

Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering. Convergence Rates for Empirical Estimation of Binary Classification Bounds. Classification of radiology reports by modality and anatomy: A comparative study. Generative Models for Pose Transfer. Billion-scale Commodity Embedding for E-commerce Recomme …

Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering


Title	Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering
Authors	Deepak Gupta, Sarah Kohail, Pushpak Bhattacharyya
Abstract	Answer triggering is the task of selecting the best-suited answer for a given question from a set of candidate answers if exists. In this paper, we present a hybrid deep learning model for answer triggering, which combines several dependency graph based alignment features, namely graph edit distance, graph-based similarity and dependency graph coverage, with dense vector embeddings from a Convolutional Neural Network (CNN). Our experiments on the WikiQA dataset show that such a combination can more accurately trigger a candidate answer compared to the previous state-of-the-art models. Comparative study on WikiQA dataset shows 5.86% absolute F-score improvement at the question level.
Tasks
Published	2018-08-05
URL	http://arxiv.org/abs/1808.01650v1
PDF	http://arxiv.org/pdf/1808.01650v1.pdf
PWC	https://paperswithcode.com/paper/combining-graph-based-dependency-features
Repo
Framework

Convergence Rates for Empirical Estimation of Binary Classification Bounds


Title	Convergence Rates for Empirical Estimation of Binary Classification Bounds
Authors	Salimeh Yasaei Sekeh, Morteza Noshad, Kevin R. Moon, Alfred O. Hero
Abstract	Bounding the best achievable error probability for binary classification problems is relevant to many applications including machine learning, signal processing, and information theory. Many bounds on the Bayes binary classification error rate depend on information divergences between the pair of class distributions. Recently, the Henze-Penrose (HP) divergence has been proposed for bounding classification error probability. We consider the problem of empirically estimating the HP-divergence from random samples. We derive a bound on the convergence rate for the Friedman-Rafsky (FR) estimator of the HP-divergence, which is related to a multivariate runs statistic for testing between two distributions. The FR estimator is derived from a multicolored Euclidean minimal spanning tree (MST) that spans the merged samples. We obtain a concentration inequality for the Friedman-Rafsky estimator of the Henze-Penrose divergence. We validate our results experimentally and illustrate their application to real datasets.
Tasks
Published	2018-10-01
URL	http://arxiv.org/abs/1810.01015v1
PDF	http://arxiv.org/pdf/1810.01015v1.pdf
PWC	https://paperswithcode.com/paper/convergence-rates-for-empirical-estimation-of
Repo
Framework

Classification of radiology reports by modality and anatomy: A comparative study


Title	Classification of radiology reports by modality and anatomy: A comparative study
Authors	Marina Bendersky, Joy Wu, Tanveer Syeda-Mahmood
Abstract	Data labeling is currently a time-consuming task that often requires expert knowledge. In research settings, the availability of correctly labeled data is crucial to ensure that model predictions are accurate and useful. We propose relatively simple machine learning-based models that achieve high performance metrics in the binary and multiclass classification of radiology reports. We compare the performance of these algorithms to that of a data-driven approach based on NLP, and find that the logistic regression classifier outperforms all other models, in both the binary and multiclass classification tasks. We then choose the logistic regression binary classifier to predict chest X-ray (CXR)/ non-chest X-ray (non-CXR) labels in reports from different datasets, unseen during any training phase of any of the models. Even in unseen report collections, the binary logistic regression classifier achieves average precision values of above 0.9. Based on the regression coefficient values, we also identify frequent tokens in CXR and non-CXR reports that are features with possibly high predictive power.
Tasks
Published	2018-12-27
URL	http://arxiv.org/abs/1812.10818v1
PDF	http://arxiv.org/pdf/1812.10818v1.pdf
PWC	https://paperswithcode.com/paper/classification-of-radiology-reports-by
Repo
Framework

Generative Models for Pose Transfer


Title	Generative Models for Pose Transfer
Authors	Patrick Chao, Alexander Li, Gokul Swamy
Abstract	We investigate nearest neighbor and generative models for transferring pose between persons. We take in a video of one person performing a sequence of actions and attempt to generate a video of another person performing the same actions. Our generative model (pix2pix) outperforms k-NN at both generating corresponding frames and generalizing outside the demonstrated action set. Our most salient contribution is determining a pipeline (pose detection, face detection, k-NN based pairing) that is effective at perform-ing the desired task. We also detail several iterative improvements and failure modes.
Tasks	Face Detection, Pose Transfer
Published	2018-06-24
URL	http://arxiv.org/abs/1806.09070v1
PDF	http://arxiv.org/pdf/1806.09070v1.pdf
PWC	https://paperswithcode.com/paper/generative-models-for-pose-transfer
Repo
Framework

Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba


Title	Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba
Authors	Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, Dik Lun Lee
Abstract	Recommender systems (RSs) have been the most important technology for increasing the business in Taobao, the largest online consumer-to-consumer (C2C) platform in China. The billion-scale data in Taobao creates three major challenges to Taobao’s RS: scalability, sparsity and cold start. In this paper, we present our technical solutions to address these three challenges. The methods are based on the graph embedding framework. We first construct an item graph from users’ behavior history. Each item is then represented as a vector using graph embedding. The item embeddings are employed to compute pairwise similarities between all items, which are then used in the recommendation process. To alleviate the sparsity and cold start problems, side information is incorporated into the embedding framework. We propose two aggregation methods to integrate the embeddings of items and the corresponding side information. Experimental results from offline experiments show that methods incorporating side information are superior to those that do not. Further, we describe the platform upon which the embedding methods are deployed and the workflow to process the billion-scale data in Taobao. Using online A/B test, we show that the online Click-Through-Rate (CTRs) are improved comparing to the previous recommendation methods widely used in Taobao, further demonstrating the effectiveness and feasibility of our proposed methods in Taobao’s live production environment.
Tasks	Graph Embedding, Recommendation Systems
Published	2018-03-06
URL	http://arxiv.org/abs/1803.02349v2
PDF	http://arxiv.org/pdf/1803.02349v2.pdf
PWC	https://paperswithcode.com/paper/billion-scale-commodity-embedding-for-e
Repo
Framework

Learning 3D Human Dynamics from Video


Title	Learning 3D Human Dynamics from Video
Authors	Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik
Abstract	From an image of a person in action, we can easily guess the 3D motion of the person in the immediate past and future. This is because we have a mental model of 3D human dynamics that we have acquired from observing visual sequences of humans in motion. We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features. At test time, from video, the learned temporal representation give rise to smooth 3D mesh predictions. From a single image, our model can recover the current 3D mesh as well as its 3D past and future motion. Our approach is designed so it can learn from videos with 2D pose annotations in a semi-supervised manner. Though annotated data is always limited, there are millions of videos uploaded daily on the Internet. In this work, we harvest this Internet-scale source of unlabeled data by training our model on unlabeled video with pseudo-ground truth 2D pose obtained from an off-the-shelf 2D pose detector. Our experiments show that adding more videos with pseudo-ground truth 2D pose monotonically improves 3D prediction performance. We evaluate our model, Human Mesh and Motion Recovery (HMMR), on the recent challenging dataset of 3D Poses in the Wild and obtain state-of-the-art performance on the 3D prediction task without any fine-tuning. The project website with video, code, and data can be found at https://akanazawa.github.io/human_dynamics/.
Tasks	3D Human Dynamics, Human Dynamics
Published	2018-12-04
URL	https://arxiv.org/abs/1812.01601v4
PDF	https://arxiv.org/pdf/1812.01601v4.pdf
PWC	https://paperswithcode.com/paper/learning-3d-human-dynamics-from-video
Repo
Framework

DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework


Title	DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework
Authors	Zihao Liu, Tao Liu, Wujie Wen, Lei Jiang, Jie Xu, Yanzhi Wang, Gang Quan
Abstract	As one of most fascinating machine learning techniques, deep neural network (DNN) has demonstrated excellent performance in various intelligent tasks such as image classification. DNN achieves such performance, to a large extent, by performing expensive training over huge volumes of training data. To reduce the data storage and transfer overhead in smart resource-limited Internet-of-Thing (IoT) systems, effective data compression is a “must-have” feature before transferring real-time produced dataset for training or classification. While there have been many well-known image compression approaches (such as JPEG), we for the first time find that a human-visual based image compression approach such as JPEG compression is not an optimized solution for DNN systems, especially with high compression ratios. To this end, we develop an image compression framework tailored for DNN applications, named “DeepN-JPEG”, to embrace the nature of deep cascaded information process mechanism of DNN architecture. Extensive experiments, based on “ImageNet” dataset with various state-of-the-art DNNs, show that “DeepN-JPEG” can achieve ~3.5x higher compression rate over the popular JPEG solution while maintaining the same accuracy level for image recognition, demonstrating its great potential of storage and power efficiency in DNN-based smart IoT system design.
Tasks	Image Classification, Image Compression
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05788v1
PDF	http://arxiv.org/pdf/1803.05788v1.pdf
PWC	https://paperswithcode.com/paper/deepn-jpeg-a-deep-neural-network-favorable
Repo
Framework

Autoencoder based image compression: can the learning be quantization independent?


Title	Autoencoder based image compression: can the learning be quantization independent?
Authors	Thierry Dumas, Aline Roumy, Christine Guillemot
Abstract	This paper explores the problem of learning transforms for image compression via autoencoders. Usually, the rate-distortion performances of image compression are tuned by varying the quantization step size. In the case of autoen-coders, this in principle would require learning one transform per rate-distortion point at a given quantization step size. Here, we show that comparable performances can be obtained with a unique learned transform. The different rate-distortion points are then reached by varying the quantization step size at test time. This approach saves a lot of training time.
Tasks	Image Compression, Quantization
Published	2018-02-23
URL	http://arxiv.org/abs/1802.09371v1
PDF	http://arxiv.org/pdf/1802.09371v1.pdf
PWC	https://paperswithcode.com/paper/autoencoder-based-image-compression-can-the
Repo
Framework

Combining Pyramid Pooling and Attention Mechanism for Pelvic MR Image Semantic Segmentaion


Title	Combining Pyramid Pooling and Attention Mechanism for Pelvic MR Image Semantic Segmentaion
Authors	Ting-Ting Liang, Satoshi Tsutsui, Liangcai Gao, Jing-Jing Lu, Mengyan Sun
Abstract	One of the time-consuming routine work for a radiologist is to discern anatomical structures from tomographic images. For assisting radiologists, this paper develops an automatic segmentation method for pelvic magnetic resonance (MR) images. The task has three major challenges 1) A pelvic organ can have various sizes and shapes depending on the axial image, which requires local contexts to segment correctly. 2) Different organs often have quite similar appearance in MR images, which requires global context to segment. 3) The number of available annotated images are very small to use the latest segmentation algorithms. To address the challenges, we propose a novel convolutional neural network called Attention-Pyramid network (APNet) that effectively exploits both local and global contexts, in addition to a data-augmentation technique that is particularly effective for MR images. In order to evaluate our method, we construct fine-grained (50 pelvic organs) MR image segmentation dataset, and experimentally confirm the superior performance of our techniques over the state-of-the-art image segmentation methods.
Tasks	Data Augmentation, Semantic Segmentation
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00264v2
PDF	http://arxiv.org/pdf/1806.00264v2.pdf
PWC	https://paperswithcode.com/paper/combining-pyramid-pooling-and-attention
Repo
Framework

Lossless Image Compression Algorithm for Wireless Capsule Endoscopy by Content-Based Classification of Image Blocks


Title	Lossless Image Compression Algorithm for Wireless Capsule Endoscopy by Content-Based Classification of Image Blocks
Authors	Atefe Rajaeefar, Ali Emami, S. M. Reza Soroushmehr, Nader Karimi, Shadrokh Samavi, Kayvan Najarian
Abstract	Recent advances in capsule endoscopy systems have introduced new methods and capabilities. The capsule endoscopy system, by observing the entire digestive tract, has significantly improved diagnosing gastrointestinal disorders and diseases. The system has challenges such as the need to enhance the quality of the transmitted images, low frame rates of transmission, and battery lifetime that need to be addressed. One of the important parts of a capsule endoscopy system is the image compression unit. Better compression of images increases the frame rate and hence improves the diagnosis process. In this paper a high precision compression algorithm with high compression ratio is proposed. In this algorithm we use the similarity between frames to compress the data more efficiently.
Tasks	Image Compression
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07781v1
PDF	http://arxiv.org/pdf/1802.07781v1.pdf
PWC	https://paperswithcode.com/paper/lossless-image-compression-algorithm-for
Repo
Framework

Lossless Compression of Angiogram Foreground with Visual Quality Preservation of Background


Title	Lossless Compression of Angiogram Foreground with Visual Quality Preservation of Background
Authors	Mahdi Ahmadi, Ali Emami, Mohsen Hajabdollahi, S. M. Reza Soroushmehr, Nader Karimi, Shadrokh Samavi, Kayvan Najarian
Abstract	By increasing the volume of telemedicine information, the need for medical image compression has become more important. In angiographic images, a small ratio of the entire image usually belongs to the vasculature that provides crucial information for diagnosis. Other parts of the image are diagnostically less important and can be compressed with higher compression ratio. However, the quality of those parts affect the visual perception of the image as well. Existing methods compress foreground and background of angiographic images using different techniques. In this paper we first utilize convolutional neural network to segment vessels and then represent a hierarchical block processing algorithm capable of both eliminating the background redundancies and preserving the overall visual quality of angiograms.
Tasks	Image Compression
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07769v1
PDF	http://arxiv.org/pdf/1802.07769v1.pdf
PWC	https://paperswithcode.com/paper/lossless-compression-of-angiogram-foreground
Repo
Framework

Speeding Up the Bilateral Filter: A Joint Acceleration Way


Title	Speeding Up the Bilateral Filter: A Joint Acceleration Way
Authors	Longquan Dai, Mengke Yuan, Xiaopeng Zhang
Abstract	Computational complexity of the brute-force implementation of the bilateral filter (BF) depends on its filter kernel size. To achieve the constant-time BF whose complexity is irrelevant to the kernel size, many techniques have been proposed, such as 2D box filtering, dimension promotion, and shiftability property. Although each of the above techniques suffers from accuracy and efficiency problems, previous algorithm designers were used to take only one of them to assemble fast implementations due to the hardness of combining them together. Hence, no joint exploitation of these techniques has been proposed to construct a new cutting edge implementation that solves these problems. Jointly employing five techniques: kernel truncation, best N -term approximation as well as previous 2D box filtering, dimension promotion, and shiftability property, we propose a unified framework to transform BF with arbitrary spatial and range kernels into a set of 3D box filters that can be computed in linear time. To the best of our knowledge, our algorithm is the first method that can integrate all these acceleration techniques and, therefore, can draw upon one another’s strong point to overcome deficiencies. The strength of our method has been corroborated by several carefully designed experiments. In particular, the filtering accuracy is significantly improved without sacrificing the efficiency at running time.
Tasks
Published	2018-02-28
URL	http://arxiv.org/abs/1803.00004v1
PDF	http://arxiv.org/pdf/1803.00004v1.pdf
PWC	https://paperswithcode.com/paper/speeding-up-the-bilateral-filter-a-joint
Repo
Framework

Embedding Grammars


Title	Embedding Grammars
Authors	David Wingate, William Myers, Nancy Fulda, Tyler Etchart
Abstract	Classic grammars and regular expressions can be used for a variety of purposes, including parsing, intent detection, and matching. However, the comparisons are performed at a structural level, with constituent elements (words or characters) matched exactly. Recent advances in word embeddings show that semantically related words share common features in a vector-space representation, suggesting the possibility of a hybrid grammar and word embedding. In this paper, we blend the structure of standard context-free grammars with the semantic generalization capabilities of word embeddings to create hybrid semantic grammars. These semantic grammars generalize the specific terminals used by the programmer to other words and phrases with related meanings, allowing the construction of compact grammars that match an entire region of the vector space rather than matching specific elements.
Tasks	Intent Detection, Word Embeddings
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04891v1
PDF	http://arxiv.org/pdf/1808.04891v1.pdf
PWC	https://paperswithcode.com/paper/embedding-grammars
Repo
Framework

Deep RBFNet: Point Cloud Feature Learning using Radial Basis Functions


Title	Deep RBFNet: Point Cloud Feature Learning using Radial Basis Functions
Authors	Weikai Chen, Xiaoguang Han, Guanbin Li, Chao Chen, Jun Xing, Yajie Zhao, Hao Li
Abstract	Three-dimensional object recognition has recently achieved great progress thanks to the development of effective point cloud-based learning frameworks, such as PointNet and its extensions. However, existing methods rely heavily on fully connected layers, which introduce a significant amount of parameters, making the network harder to train and prone to overfitting problems. In this paper, we propose a simple yet effective framework for point set feature learning by leveraging a nonlinear activation layer encoded by Radial Basis Function (RBF) kernels. Unlike PointNet variants, that fail to recognize local point patterns, our approach explicitly models the spatial distribution of point clouds by aggregating features from sparsely distributed RBF kernels. A typical RBF kernel, e.g. Gaussian function, naturally penalizes long-distance response and is only activated by neighboring points. Such localized response generates highly discriminative features given different point distributions. In addition, our framework allows the joint optimization of kernel distribution and its receptive field, automatically evolving kernel configurations in an end-to-end manner. We demonstrate that the proposed network with a single RBF layer can outperform the state-of-the-art Pointnet++ in terms of classification accuracy for 3D object recognition tasks. Moreover, the introduction of nonlinear mappings significantly reduces the number of network parameters and computational cost, enabling significantly faster training and a deployable point cloud recognition solution on portable devices with limited resources.
Tasks	3D Object Recognition, Object Recognition
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04302v2
PDF	http://arxiv.org/pdf/1812.04302v2.pdf
PWC	https://paperswithcode.com/paper/deep-rbfnet-point-cloud-feature-learning
Repo
Framework

Improving Gated Recurrent Unit Based Acoustic Modeling with Batch Normalization and Enlarged Context


Title	Improving Gated Recurrent Unit Based Acoustic Modeling with Batch Normalization and Enlarged Context
Authors	Jie Li, Yahui Shan, Xiaorui Wang, Yan Li
Abstract	The use of future contextual information is typically shown to be helpful for acoustic modeling. Recently, we proposed a RNN model called minimal gated recurrent unit with input projection (mGRUIP), in which a context module namely temporal convolution, is specifically designed to model the future context. This model, mGRUIP with context module (mGRUIP-Ctx), has been shown to be able of utilizing the future context effectively, meanwhile with quite low model latency and computation cost. In this paper, we continue to improve mGRUIP-Ctx with two revisions: applying BN methods and enlarging model context. Experimental results on two Mandarin ASR tasks (8400 hours and 60K hours) show that, the revised mGRUIP-Ctx outperform LSTM with a large margin (11% to 38%). It even performs slightly better than a superior BLSTM on the 8400h task, with 33M less parameters and just 290ms model latency.
Tasks
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10169v1
PDF	http://arxiv.org/pdf/1811.10169v1.pdf
PWC	https://paperswithcode.com/paper/improving-gated-recurrent-unit-based-acoustic
Repo
Framework