October 17, 2019

3027 words 15 mins read

Paper Group ANR 791

Paper Group ANR 791

Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering. Convergence Rates for Empirical Estimation of Binary Classification Bounds. Classification of radiology reports by modality and anatomy: A comparative study. Generative Models for Pose Transfer. Billion-scale Commodity Embedding for E-commerce Recomme …

Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering

Title Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering
Authors Deepak Gupta, Sarah Kohail, Pushpak Bhattacharyya
Abstract Answer triggering is the task of selecting the best-suited answer for a given question from a set of candidate answers if exists. In this paper, we present a hybrid deep learning model for answer triggering, which combines several dependency graph based alignment features, namely graph edit distance, graph-based similarity and dependency graph coverage, with dense vector embeddings from a Convolutional Neural Network (CNN). Our experiments on the WikiQA dataset show that such a combination can more accurately trigger a candidate answer compared to the previous state-of-the-art models. Comparative study on WikiQA dataset shows 5.86% absolute F-score improvement at the question level.
Tasks
Published 2018-08-05
URL http://arxiv.org/abs/1808.01650v1
PDF http://arxiv.org/pdf/1808.01650v1.pdf
PWC https://paperswithcode.com/paper/combining-graph-based-dependency-features
Repo
Framework

Convergence Rates for Empirical Estimation of Binary Classification Bounds

Title Convergence Rates for Empirical Estimation of Binary Classification Bounds
Authors Salimeh Yasaei Sekeh, Morteza Noshad, Kevin R. Moon, Alfred O. Hero
Abstract Bounding the best achievable error probability for binary classification problems is relevant to many applications including machine learning, signal processing, and information theory. Many bounds on the Bayes binary classification error rate depend on information divergences between the pair of class distributions. Recently, the Henze-Penrose (HP) divergence has been proposed for bounding classification error probability. We consider the problem of empirically estimating the HP-divergence from random samples. We derive a bound on the convergence rate for the Friedman-Rafsky (FR) estimator of the HP-divergence, which is related to a multivariate runs statistic for testing between two distributions. The FR estimator is derived from a multicolored Euclidean minimal spanning tree (MST) that spans the merged samples. We obtain a concentration inequality for the Friedman-Rafsky estimator of the Henze-Penrose divergence. We validate our results experimentally and illustrate their application to real datasets.
Tasks
Published 2018-10-01
URL http://arxiv.org/abs/1810.01015v1
PDF http://arxiv.org/pdf/1810.01015v1.pdf
PWC https://paperswithcode.com/paper/convergence-rates-for-empirical-estimation-of
Repo
Framework

Classification of radiology reports by modality and anatomy: A comparative study

Title Classification of radiology reports by modality and anatomy: A comparative study
Authors Marina Bendersky, Joy Wu, Tanveer Syeda-Mahmood
Abstract Data labeling is currently a time-consuming task that often requires expert knowledge. In research settings, the availability of correctly labeled data is crucial to ensure that model predictions are accurate and useful. We propose relatively simple machine learning-based models that achieve high performance metrics in the binary and multiclass classification of radiology reports. We compare the performance of these algorithms to that of a data-driven approach based on NLP, and find that the logistic regression classifier outperforms all other models, in both the binary and multiclass classification tasks. We then choose the logistic regression binary classifier to predict chest X-ray (CXR)/ non-chest X-ray (non-CXR) labels in reports from different datasets, unseen during any training phase of any of the models. Even in unseen report collections, the binary logistic regression classifier achieves average precision values of above 0.9. Based on the regression coefficient values, we also identify frequent tokens in CXR and non-CXR reports that are features with possibly high predictive power.
Tasks
Published 2018-12-27
URL http://arxiv.org/abs/1812.10818v1
PDF http://arxiv.org/pdf/1812.10818v1.pdf
PWC https://paperswithcode.com/paper/classification-of-radiology-reports-by
Repo
Framework

Generative Models for Pose Transfer

Title Generative Models for Pose Transfer
Authors Patrick Chao, Alexander Li, Gokul Swamy
Abstract We investigate nearest neighbor and generative models for transferring pose between persons. We take in a video of one person performing a sequence of actions and attempt to generate a video of another person performing the same actions. Our generative model (pix2pix) outperforms k-NN at both generating corresponding frames and generalizing outside the demonstrated action set. Our most salient contribution is determining a pipeline (pose detection, face detection, k-NN based pairing) that is effective at perform-ing the desired task. We also detail several iterative improvements and failure modes.
Tasks Face Detection, Pose Transfer
Published 2018-06-24
URL http://arxiv.org/abs/1806.09070v1
PDF http://arxiv.org/pdf/1806.09070v1.pdf
PWC https://paperswithcode.com/paper/generative-models-for-pose-transfer
Repo
Framework

Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba

Title Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba
Authors Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, Dik Lun Lee
Abstract Recommender systems (RSs) have been the most important technology for increasing the business in Taobao, the largest online consumer-to-consumer (C2C) platform in China. The billion-scale data in Taobao creates three major challenges to Taobao’s RS: scalability, sparsity and cold start. In this paper, we present our technical solutions to address these three challenges. The methods are based on the graph embedding framework. We first construct an item graph from users’ behavior history. Each item is then represented as a vector using graph embedding. The item embeddings are employed to compute pairwise similarities between all items, which are then used in the recommendation process. To alleviate the sparsity and cold start problems, side information is incorporated into the embedding framework. We propose two aggregation methods to integrate the embeddings of items and the corresponding side information. Experimental results from offline experiments show that methods incorporating side information are superior to those that do not. Further, we describe the platform upon which the embedding methods are deployed and the workflow to process the billion-scale data in Taobao. Using online A/B test, we show that the online Click-Through-Rate (CTRs) are improved comparing to the previous recommendation methods widely used in Taobao, further demonstrating the effectiveness and feasibility of our proposed methods in Taobao’s live production environment.
Tasks Graph Embedding, Recommendation Systems
Published 2018-03-06
URL http://arxiv.org/abs/1803.02349v2
PDF http://arxiv.org/pdf/1803.02349v2.pdf
PWC https://paperswithcode.com/paper/billion-scale-commodity-embedding-for-e
Repo
Framework

Learning 3D Human Dynamics from Video

Title Learning 3D Human Dynamics from Video
Authors Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik
Abstract From an image of a person in action, we can easily guess the 3D motion of the person in the immediate past and future. This is because we have a mental model of 3D human dynamics that we have acquired from observing visual sequences of humans in motion. We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features. At test time, from video, the learned temporal representation give rise to smooth 3D mesh predictions. From a single image, our model can recover the current 3D mesh as well as its 3D past and future motion. Our approach is designed so it can learn from videos with 2D pose annotations in a semi-supervised manner. Though annotated data is always limited, there are millions of videos uploaded daily on the Internet. In this work, we harvest this Internet-scale source of unlabeled data by training our model on unlabeled video with pseudo-ground truth 2D pose obtained from an off-the-shelf 2D pose detector. Our experiments show that adding more videos with pseudo-ground truth 2D pose monotonically improves 3D prediction performance. We evaluate our model, Human Mesh and Motion Recovery (HMMR), on the recent challenging dataset of 3D Poses in the Wild and obtain state-of-the-art performance on the 3D prediction task without any fine-tuning. The project website with video, code, and data can be found at https://akanazawa.github.io/human_dynamics/.
Tasks 3D Human Dynamics, Human Dynamics
Published 2018-12-04
URL https://arxiv.org/abs/1812.01601v4
PDF https://arxiv.org/pdf/1812.01601v4.pdf
PWC https://paperswithcode.com/paper/learning-3d-human-dynamics-from-video
Repo
Framework

DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework

Title DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework
Authors Zihao Liu, Tao Liu, Wujie Wen, Lei Jiang, Jie Xu, Yanzhi Wang, Gang Quan
Abstract As one of most fascinating machine learning techniques, deep neural network (DNN) has demonstrated excellent performance in various intelligent tasks such as image classification. DNN achieves such performance, to a large extent, by performing expensive training over huge volumes of training data. To reduce the data storage and transfer overhead in smart resource-limited Internet-of-Thing (IoT) systems, effective data compression is a “must-have” feature before transferring real-time produced dataset for training or classification. While there have been many well-known image compression approaches (such as JPEG), we for the first time find that a human-visual based image compression approach such as JPEG compression is not an optimized solution for DNN systems, especially with high compression ratios. To this end, we develop an image compression framework tailored for DNN applications, named “DeepN-JPEG”, to embrace the nature of deep cascaded information process mechanism of DNN architecture. Extensive experiments, based on “ImageNet” dataset with various state-of-the-art DNNs, show that “DeepN-JPEG” can achieve ~3.5x higher compression rate over the popular JPEG solution while maintaining the same accuracy level for image recognition, demonstrating its great potential of storage and power efficiency in DNN-based smart IoT system design.
Tasks Image Classification, Image Compression
Published 2018-03-14
URL http://arxiv.org/abs/1803.05788v1
PDF http://arxiv.org/pdf/1803.05788v1.pdf
PWC https://paperswithcode.com/paper/deepn-jpeg-a-deep-neural-network-favorable
Repo
Framework

Autoencoder based image compression: can the learning be quantization independent?

Title Autoencoder based image compression: can the learning be quantization independent?
Authors Thierry Dumas, Aline Roumy, Christine Guillemot
Abstract This paper explores the problem of learning transforms for image compression via autoencoders. Usually, the rate-distortion performances of image compression are tuned by varying the quantization step size. In the case of autoen-coders, this in principle would require learning one transform per rate-distortion point at a given quantization step size. Here, we show that comparable performances can be obtained with a unique learned transform. The different rate-distortion points are then reached by varying the quantization step size at test time. This approach saves a lot of training time.
Tasks Image Compression, Quantization
Published 2018-02-23
URL http://arxiv.org/abs/1802.09371v1
PDF http://arxiv.org/pdf/1802.09371v1.pdf
PWC https://paperswithcode.com/paper/autoencoder-based-image-compression-can-the
Repo
Framework

Combining Pyramid Pooling and Attention Mechanism for Pelvic MR Image Semantic Segmentaion

Title Combining Pyramid Pooling and Attention Mechanism for Pelvic MR Image Semantic Segmentaion
Authors Ting-Ting Liang, Satoshi Tsutsui, Liangcai Gao, Jing-Jing Lu, Mengyan Sun
Abstract One of the time-consuming routine work for a radiologist is to discern anatomical structures from tomographic images. For assisting radiologists, this paper develops an automatic segmentation method for pelvic magnetic resonance (MR) images. The task has three major challenges 1) A pelvic organ can have various sizes and shapes depending on the axial image, which requires local contexts to segment correctly. 2) Different organs often have quite similar appearance in MR images, which requires global context to segment. 3) The number of available annotated images are very small to use the latest segmentation algorithms. To address the challenges, we propose a novel convolutional neural network called Attention-Pyramid network (APNet) that effectively exploits both local and global contexts, in addition to a data-augmentation technique that is particularly effective for MR images. In order to evaluate our method, we construct fine-grained (50 pelvic organs) MR image segmentation dataset, and experimentally confirm the superior performance of our techniques over the state-of-the-art image segmentation methods.
Tasks Data Augmentation, Semantic Segmentation
Published 2018-06-01
URL http://arxiv.org/abs/1806.00264v2
PDF http://arxiv.org/pdf/1806.00264v2.pdf
PWC https://paperswithcode.com/paper/combining-pyramid-pooling-and-attention
Repo
Framework

Lossless Image Compression Algorithm for Wireless Capsule Endoscopy by Content-Based Classification of Image Blocks

Title Lossless Image Compression Algorithm for Wireless Capsule Endoscopy by Content-Based Classification of Image Blocks
Authors Atefe Rajaeefar, Ali Emami, S. M. Reza Soroushmehr, Nader Karimi, Shadrokh Samavi, Kayvan Najarian
Abstract Recent advances in capsule endoscopy systems have introduced new methods and capabilities. The capsule endoscopy system, by observing the entire digestive tract, has significantly improved diagnosing gastrointestinal disorders and diseases. The system has challenges such as the need to enhance the quality of the transmitted images, low frame rates of transmission, and battery lifetime that need to be addressed. One of the important parts of a capsule endoscopy system is the image compression unit. Better compression of images increases the frame rate and hence improves the diagnosis process. In this paper a high precision compression algorithm with high compression ratio is proposed. In this algorithm we use the similarity between frames to compress the data more efficiently.
Tasks Image Compression
Published 2018-02-21
URL http://arxiv.org/abs/1802.07781v1
PDF http://arxiv.org/pdf/1802.07781v1.pdf
PWC https://paperswithcode.com/paper/lossless-image-compression-algorithm-for
Repo
Framework

Lossless Compression of Angiogram Foreground with Visual Quality Preservation of Background

Title Lossless Compression of Angiogram Foreground with Visual Quality Preservation of Background
Authors Mahdi Ahmadi, Ali Emami, Mohsen Hajabdollahi, S. M. Reza Soroushmehr, Nader Karimi, Shadrokh Samavi, Kayvan Najarian
Abstract By increasing the volume of telemedicine information, the need for medical image compression has become more important. In angiographic images, a small ratio of the entire image usually belongs to the vasculature that provides crucial information for diagnosis. Other parts of the image are diagnostically less important and can be compressed with higher compression ratio. However, the quality of those parts affect the visual perception of the image as well. Existing methods compress foreground and background of angiographic images using different techniques. In this paper we first utilize convolutional neural network to segment vessels and then represent a hierarchical block processing algorithm capable of both eliminating the background redundancies and preserving the overall visual quality of angiograms.
Tasks Image Compression
Published 2018-02-21
URL http://arxiv.org/abs/1802.07769v1
PDF http://arxiv.org/pdf/1802.07769v1.pdf
PWC https://paperswithcode.com/paper/lossless-compression-of-angiogram-foreground
Repo
Framework

Speeding Up the Bilateral Filter: A Joint Acceleration Way

Title Speeding Up the Bilateral Filter: A Joint Acceleration Way
Authors Longquan Dai, Mengke Yuan, Xiaopeng Zhang
Abstract Computational complexity of the brute-force implementation of the bilateral filter (BF) depends on its filter kernel size. To achieve the constant-time BF whose complexity is irrelevant to the kernel size, many techniques have been proposed, such as 2D box filtering, dimension promotion, and shiftability property. Although each of the above techniques suffers from accuracy and efficiency problems, previous algorithm designers were used to take only one of them to assemble fast implementations due to the hardness of combining them together. Hence, no joint exploitation of these techniques has been proposed to construct a new cutting edge implementation that solves these problems. Jointly employing five techniques: kernel truncation, best N -term approximation as well as previous 2D box filtering, dimension promotion, and shiftability property, we propose a unified framework to transform BF with arbitrary spatial and range kernels into a set of 3D box filters that can be computed in linear time. To the best of our knowledge, our algorithm is the first method that can integrate all these acceleration techniques and, therefore, can draw upon one another’s strong point to overcome deficiencies. The strength of our method has been corroborated by several carefully designed experiments. In particular, the filtering accuracy is significantly improved without sacrificing the efficiency at running time.
Tasks
Published 2018-02-28
URL http://arxiv.org/abs/1803.00004v1
PDF http://arxiv.org/pdf/1803.00004v1.pdf
PWC https://paperswithcode.com/paper/speeding-up-the-bilateral-filter-a-joint
Repo
Framework

Embedding Grammars

Title Embedding Grammars
Authors David Wingate, William Myers, Nancy Fulda, Tyler Etchart
Abstract Classic grammars and regular expressions can be used for a variety of purposes, including parsing, intent detection, and matching. However, the comparisons are performed at a structural level, with constituent elements (words or characters) matched exactly. Recent advances in word embeddings show that semantically related words share common features in a vector-space representation, suggesting the possibility of a hybrid grammar and word embedding. In this paper, we blend the structure of standard context-free grammars with the semantic generalization capabilities of word embeddings to create hybrid semantic grammars. These semantic grammars generalize the specific terminals used by the programmer to other words and phrases with related meanings, allowing the construction of compact grammars that match an entire region of the vector space rather than matching specific elements.
Tasks Intent Detection, Word Embeddings
Published 2018-08-14
URL http://arxiv.org/abs/1808.04891v1
PDF http://arxiv.org/pdf/1808.04891v1.pdf
PWC https://paperswithcode.com/paper/embedding-grammars
Repo
Framework

Deep RBFNet: Point Cloud Feature Learning using Radial Basis Functions

Title Deep RBFNet: Point Cloud Feature Learning using Radial Basis Functions
Authors Weikai Chen, Xiaoguang Han, Guanbin Li, Chao Chen, Jun Xing, Yajie Zhao, Hao Li
Abstract Three-dimensional object recognition has recently achieved great progress thanks to the development of effective point cloud-based learning frameworks, such as PointNet and its extensions. However, existing methods rely heavily on fully connected layers, which introduce a significant amount of parameters, making the network harder to train and prone to overfitting problems. In this paper, we propose a simple yet effective framework for point set feature learning by leveraging a nonlinear activation layer encoded by Radial Basis Function (RBF) kernels. Unlike PointNet variants, that fail to recognize local point patterns, our approach explicitly models the spatial distribution of point clouds by aggregating features from sparsely distributed RBF kernels. A typical RBF kernel, e.g. Gaussian function, naturally penalizes long-distance response and is only activated by neighboring points. Such localized response generates highly discriminative features given different point distributions. In addition, our framework allows the joint optimization of kernel distribution and its receptive field, automatically evolving kernel configurations in an end-to-end manner. We demonstrate that the proposed network with a single RBF layer can outperform the state-of-the-art Pointnet++ in terms of classification accuracy for 3D object recognition tasks. Moreover, the introduction of nonlinear mappings significantly reduces the number of network parameters and computational cost, enabling significantly faster training and a deployable point cloud recognition solution on portable devices with limited resources.
Tasks 3D Object Recognition, Object Recognition
Published 2018-12-11
URL http://arxiv.org/abs/1812.04302v2
PDF http://arxiv.org/pdf/1812.04302v2.pdf
PWC https://paperswithcode.com/paper/deep-rbfnet-point-cloud-feature-learning
Repo
Framework

Improving Gated Recurrent Unit Based Acoustic Modeling with Batch Normalization and Enlarged Context

Title Improving Gated Recurrent Unit Based Acoustic Modeling with Batch Normalization and Enlarged Context
Authors Jie Li, Yahui Shan, Xiaorui Wang, Yan Li
Abstract The use of future contextual information is typically shown to be helpful for acoustic modeling. Recently, we proposed a RNN model called minimal gated recurrent unit with input projection (mGRUIP), in which a context module namely temporal convolution, is specifically designed to model the future context. This model, mGRUIP with context module (mGRUIP-Ctx), has been shown to be able of utilizing the future context effectively, meanwhile with quite low model latency and computation cost. In this paper, we continue to improve mGRUIP-Ctx with two revisions: applying BN methods and enlarging model context. Experimental results on two Mandarin ASR tasks (8400 hours and 60K hours) show that, the revised mGRUIP-Ctx outperform LSTM with a large margin (11% to 38%). It even performs slightly better than a superior BLSTM on the 8400h task, with 33M less parameters and just 290ms model latency.
Tasks
Published 2018-11-26
URL http://arxiv.org/abs/1811.10169v1
PDF http://arxiv.org/pdf/1811.10169v1.pdf
PWC https://paperswithcode.com/paper/improving-gated-recurrent-unit-based-acoustic
Repo
Framework
comments powered by Disqus