Paper Group ANR 791
Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering. Convergence Rates for Empirical Estimation of Binary Classification Bounds. Classification of radiology reports by modality and anatomy: A comparative study. Generative Models for Pose Transfer. Billion-scale Commodity Embedding for E-commerce Recomme …
Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering
Title | Combining Graph-based Dependency Features with Convolutional Neural Network for Answer Triggering |
Authors | Deepak Gupta, Sarah Kohail, Pushpak Bhattacharyya |
Abstract | Answer triggering is the task of selecting the best-suited answer for a given question from a set of candidate answers if exists. In this paper, we present a hybrid deep learning model for answer triggering, which combines several dependency graph based alignment features, namely graph edit distance, graph-based similarity and dependency graph coverage, with dense vector embeddings from a Convolutional Neural Network (CNN). Our experiments on the WikiQA dataset show that such a combination can more accurately trigger a candidate answer compared to the previous state-of-the-art models. Comparative study on WikiQA dataset shows 5.86% absolute F-score improvement at the question level. |
Tasks | |
Published | 2018-08-05 |
URL | http://arxiv.org/abs/1808.01650v1 |
http://arxiv.org/pdf/1808.01650v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-graph-based-dependency-features |
Repo | |
Framework | |
Convergence Rates for Empirical Estimation of Binary Classification Bounds
Title | Convergence Rates for Empirical Estimation of Binary Classification Bounds |
Authors | Salimeh Yasaei Sekeh, Morteza Noshad, Kevin R. Moon, Alfred O. Hero |
Abstract | Bounding the best achievable error probability for binary classification problems is relevant to many applications including machine learning, signal processing, and information theory. Many bounds on the Bayes binary classification error rate depend on information divergences between the pair of class distributions. Recently, the Henze-Penrose (HP) divergence has been proposed for bounding classification error probability. We consider the problem of empirically estimating the HP-divergence from random samples. We derive a bound on the convergence rate for the Friedman-Rafsky (FR) estimator of the HP-divergence, which is related to a multivariate runs statistic for testing between two distributions. The FR estimator is derived from a multicolored Euclidean minimal spanning tree (MST) that spans the merged samples. We obtain a concentration inequality for the Friedman-Rafsky estimator of the Henze-Penrose divergence. We validate our results experimentally and illustrate their application to real datasets. |
Tasks | |
Published | 2018-10-01 |
URL | http://arxiv.org/abs/1810.01015v1 |
http://arxiv.org/pdf/1810.01015v1.pdf | |
PWC | https://paperswithcode.com/paper/convergence-rates-for-empirical-estimation-of |
Repo | |
Framework | |
Classification of radiology reports by modality and anatomy: A comparative study
Title | Classification of radiology reports by modality and anatomy: A comparative study |
Authors | Marina Bendersky, Joy Wu, Tanveer Syeda-Mahmood |
Abstract | Data labeling is currently a time-consuming task that often requires expert knowledge. In research settings, the availability of correctly labeled data is crucial to ensure that model predictions are accurate and useful. We propose relatively simple machine learning-based models that achieve high performance metrics in the binary and multiclass classification of radiology reports. We compare the performance of these algorithms to that of a data-driven approach based on NLP, and find that the logistic regression classifier outperforms all other models, in both the binary and multiclass classification tasks. We then choose the logistic regression binary classifier to predict chest X-ray (CXR)/ non-chest X-ray (non-CXR) labels in reports from different datasets, unseen during any training phase of any of the models. Even in unseen report collections, the binary logistic regression classifier achieves average precision values of above 0.9. Based on the regression coefficient values, we also identify frequent tokens in CXR and non-CXR reports that are features with possibly high predictive power. |
Tasks | |
Published | 2018-12-27 |
URL | http://arxiv.org/abs/1812.10818v1 |
http://arxiv.org/pdf/1812.10818v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-of-radiology-reports-by |
Repo | |
Framework | |
Generative Models for Pose Transfer
Title | Generative Models for Pose Transfer |
Authors | Patrick Chao, Alexander Li, Gokul Swamy |
Abstract | We investigate nearest neighbor and generative models for transferring pose between persons. We take in a video of one person performing a sequence of actions and attempt to generate a video of another person performing the same actions. Our generative model (pix2pix) outperforms k-NN at both generating corresponding frames and generalizing outside the demonstrated action set. Our most salient contribution is determining a pipeline (pose detection, face detection, k-NN based pairing) that is effective at perform-ing the desired task. We also detail several iterative improvements and failure modes. |
Tasks | Face Detection, Pose Transfer |
Published | 2018-06-24 |
URL | http://arxiv.org/abs/1806.09070v1 |
http://arxiv.org/pdf/1806.09070v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-models-for-pose-transfer |
Repo | |
Framework | |
Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba
Title | Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba |
Authors | Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, Dik Lun Lee |
Abstract | Recommender systems (RSs) have been the most important technology for increasing the business in Taobao, the largest online consumer-to-consumer (C2C) platform in China. The billion-scale data in Taobao creates three major challenges to Taobao’s RS: scalability, sparsity and cold start. In this paper, we present our technical solutions to address these three challenges. The methods are based on the graph embedding framework. We first construct an item graph from users’ behavior history. Each item is then represented as a vector using graph embedding. The item embeddings are employed to compute pairwise similarities between all items, which are then used in the recommendation process. To alleviate the sparsity and cold start problems, side information is incorporated into the embedding framework. We propose two aggregation methods to integrate the embeddings of items and the corresponding side information. Experimental results from offline experiments show that methods incorporating side information are superior to those that do not. Further, we describe the platform upon which the embedding methods are deployed and the workflow to process the billion-scale data in Taobao. Using online A/B test, we show that the online Click-Through-Rate (CTRs) are improved comparing to the previous recommendation methods widely used in Taobao, further demonstrating the effectiveness and feasibility of our proposed methods in Taobao’s live production environment. |
Tasks | Graph Embedding, Recommendation Systems |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02349v2 |
http://arxiv.org/pdf/1803.02349v2.pdf | |
PWC | https://paperswithcode.com/paper/billion-scale-commodity-embedding-for-e |
Repo | |
Framework | |
Learning 3D Human Dynamics from Video
Title | Learning 3D Human Dynamics from Video |
Authors | Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik |
Abstract | From an image of a person in action, we can easily guess the 3D motion of the person in the immediate past and future. This is because we have a mental model of 3D human dynamics that we have acquired from observing visual sequences of humans in motion. We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features. At test time, from video, the learned temporal representation give rise to smooth 3D mesh predictions. From a single image, our model can recover the current 3D mesh as well as its 3D past and future motion. Our approach is designed so it can learn from videos with 2D pose annotations in a semi-supervised manner. Though annotated data is always limited, there are millions of videos uploaded daily on the Internet. In this work, we harvest this Internet-scale source of unlabeled data by training our model on unlabeled video with pseudo-ground truth 2D pose obtained from an off-the-shelf 2D pose detector. Our experiments show that adding more videos with pseudo-ground truth 2D pose monotonically improves 3D prediction performance. We evaluate our model, Human Mesh and Motion Recovery (HMMR), on the recent challenging dataset of 3D Poses in the Wild and obtain state-of-the-art performance on the 3D prediction task without any fine-tuning. The project website with video, code, and data can be found at https://akanazawa.github.io/human_dynamics/. |
Tasks | 3D Human Dynamics, Human Dynamics |
Published | 2018-12-04 |
URL | https://arxiv.org/abs/1812.01601v4 |
https://arxiv.org/pdf/1812.01601v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-3d-human-dynamics-from-video |
Repo | |
Framework | |
DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework
Title | DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework |
Authors | Zihao Liu, Tao Liu, Wujie Wen, Lei Jiang, Jie Xu, Yanzhi Wang, Gang Quan |
Abstract | As one of most fascinating machine learning techniques, deep neural network (DNN) has demonstrated excellent performance in various intelligent tasks such as image classification. DNN achieves such performance, to a large extent, by performing expensive training over huge volumes of training data. To reduce the data storage and transfer overhead in smart resource-limited Internet-of-Thing (IoT) systems, effective data compression is a “must-have” feature before transferring real-time produced dataset for training or classification. While there have been many well-known image compression approaches (such as JPEG), we for the first time find that a human-visual based image compression approach such as JPEG compression is not an optimized solution for DNN systems, especially with high compression ratios. To this end, we develop an image compression framework tailored for DNN applications, named “DeepN-JPEG”, to embrace the nature of deep cascaded information process mechanism of DNN architecture. Extensive experiments, based on “ImageNet” dataset with various state-of-the-art DNNs, show that “DeepN-JPEG” can achieve ~3.5x higher compression rate over the popular JPEG solution while maintaining the same accuracy level for image recognition, demonstrating its great potential of storage and power efficiency in DNN-based smart IoT system design. |
Tasks | Image Classification, Image Compression |
Published | 2018-03-14 |
URL | http://arxiv.org/abs/1803.05788v1 |
http://arxiv.org/pdf/1803.05788v1.pdf | |
PWC | https://paperswithcode.com/paper/deepn-jpeg-a-deep-neural-network-favorable |
Repo | |
Framework | |
Autoencoder based image compression: can the learning be quantization independent?
Title | Autoencoder based image compression: can the learning be quantization independent? |
Authors | Thierry Dumas, Aline Roumy, Christine Guillemot |
Abstract | This paper explores the problem of learning transforms for image compression via autoencoders. Usually, the rate-distortion performances of image compression are tuned by varying the quantization step size. In the case of autoen-coders, this in principle would require learning one transform per rate-distortion point at a given quantization step size. Here, we show that comparable performances can be obtained with a unique learned transform. The different rate-distortion points are then reached by varying the quantization step size at test time. This approach saves a lot of training time. |
Tasks | Image Compression, Quantization |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.09371v1 |
http://arxiv.org/pdf/1802.09371v1.pdf | |
PWC | https://paperswithcode.com/paper/autoencoder-based-image-compression-can-the |
Repo | |
Framework | |
Combining Pyramid Pooling and Attention Mechanism for Pelvic MR Image Semantic Segmentaion
Title | Combining Pyramid Pooling and Attention Mechanism for Pelvic MR Image Semantic Segmentaion |
Authors | Ting-Ting Liang, Satoshi Tsutsui, Liangcai Gao, Jing-Jing Lu, Mengyan Sun |
Abstract | One of the time-consuming routine work for a radiologist is to discern anatomical structures from tomographic images. For assisting radiologists, this paper develops an automatic segmentation method for pelvic magnetic resonance (MR) images. The task has three major challenges 1) A pelvic organ can have various sizes and shapes depending on the axial image, which requires local contexts to segment correctly. 2) Different organs often have quite similar appearance in MR images, which requires global context to segment. 3) The number of available annotated images are very small to use the latest segmentation algorithms. To address the challenges, we propose a novel convolutional neural network called Attention-Pyramid network (APNet) that effectively exploits both local and global contexts, in addition to a data-augmentation technique that is particularly effective for MR images. In order to evaluate our method, we construct fine-grained (50 pelvic organs) MR image segmentation dataset, and experimentally confirm the superior performance of our techniques over the state-of-the-art image segmentation methods. |
Tasks | Data Augmentation, Semantic Segmentation |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.00264v2 |
http://arxiv.org/pdf/1806.00264v2.pdf | |
PWC | https://paperswithcode.com/paper/combining-pyramid-pooling-and-attention |
Repo | |
Framework | |
Lossless Image Compression Algorithm for Wireless Capsule Endoscopy by Content-Based Classification of Image Blocks
Title | Lossless Image Compression Algorithm for Wireless Capsule Endoscopy by Content-Based Classification of Image Blocks |
Authors | Atefe Rajaeefar, Ali Emami, S. M. Reza Soroushmehr, Nader Karimi, Shadrokh Samavi, Kayvan Najarian |
Abstract | Recent advances in capsule endoscopy systems have introduced new methods and capabilities. The capsule endoscopy system, by observing the entire digestive tract, has significantly improved diagnosing gastrointestinal disorders and diseases. The system has challenges such as the need to enhance the quality of the transmitted images, low frame rates of transmission, and battery lifetime that need to be addressed. One of the important parts of a capsule endoscopy system is the image compression unit. Better compression of images increases the frame rate and hence improves the diagnosis process. In this paper a high precision compression algorithm with high compression ratio is proposed. In this algorithm we use the similarity between frames to compress the data more efficiently. |
Tasks | Image Compression |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07781v1 |
http://arxiv.org/pdf/1802.07781v1.pdf | |
PWC | https://paperswithcode.com/paper/lossless-image-compression-algorithm-for |
Repo | |
Framework | |
Lossless Compression of Angiogram Foreground with Visual Quality Preservation of Background
Title | Lossless Compression of Angiogram Foreground with Visual Quality Preservation of Background |
Authors | Mahdi Ahmadi, Ali Emami, Mohsen Hajabdollahi, S. M. Reza Soroushmehr, Nader Karimi, Shadrokh Samavi, Kayvan Najarian |
Abstract | By increasing the volume of telemedicine information, the need for medical image compression has become more important. In angiographic images, a small ratio of the entire image usually belongs to the vasculature that provides crucial information for diagnosis. Other parts of the image are diagnostically less important and can be compressed with higher compression ratio. However, the quality of those parts affect the visual perception of the image as well. Existing methods compress foreground and background of angiographic images using different techniques. In this paper we first utilize convolutional neural network to segment vessels and then represent a hierarchical block processing algorithm capable of both eliminating the background redundancies and preserving the overall visual quality of angiograms. |
Tasks | Image Compression |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07769v1 |
http://arxiv.org/pdf/1802.07769v1.pdf | |
PWC | https://paperswithcode.com/paper/lossless-compression-of-angiogram-foreground |
Repo | |
Framework | |
Speeding Up the Bilateral Filter: A Joint Acceleration Way
Title | Speeding Up the Bilateral Filter: A Joint Acceleration Way |
Authors | Longquan Dai, Mengke Yuan, Xiaopeng Zhang |
Abstract | Computational complexity of the brute-force implementation of the bilateral filter (BF) depends on its filter kernel size. To achieve the constant-time BF whose complexity is irrelevant to the kernel size, many techniques have been proposed, such as 2D box filtering, dimension promotion, and shiftability property. Although each of the above techniques suffers from accuracy and efficiency problems, previous algorithm designers were used to take only one of them to assemble fast implementations due to the hardness of combining them together. Hence, no joint exploitation of these techniques has been proposed to construct a new cutting edge implementation that solves these problems. Jointly employing five techniques: kernel truncation, best N -term approximation as well as previous 2D box filtering, dimension promotion, and shiftability property, we propose a unified framework to transform BF with arbitrary spatial and range kernels into a set of 3D box filters that can be computed in linear time. To the best of our knowledge, our algorithm is the first method that can integrate all these acceleration techniques and, therefore, can draw upon one another’s strong point to overcome deficiencies. The strength of our method has been corroborated by several carefully designed experiments. In particular, the filtering accuracy is significantly improved without sacrificing the efficiency at running time. |
Tasks | |
Published | 2018-02-28 |
URL | http://arxiv.org/abs/1803.00004v1 |
http://arxiv.org/pdf/1803.00004v1.pdf | |
PWC | https://paperswithcode.com/paper/speeding-up-the-bilateral-filter-a-joint |
Repo | |
Framework | |
Embedding Grammars
Title | Embedding Grammars |
Authors | David Wingate, William Myers, Nancy Fulda, Tyler Etchart |
Abstract | Classic grammars and regular expressions can be used for a variety of purposes, including parsing, intent detection, and matching. However, the comparisons are performed at a structural level, with constituent elements (words or characters) matched exactly. Recent advances in word embeddings show that semantically related words share common features in a vector-space representation, suggesting the possibility of a hybrid grammar and word embedding. In this paper, we blend the structure of standard context-free grammars with the semantic generalization capabilities of word embeddings to create hybrid semantic grammars. These semantic grammars generalize the specific terminals used by the programmer to other words and phrases with related meanings, allowing the construction of compact grammars that match an entire region of the vector space rather than matching specific elements. |
Tasks | Intent Detection, Word Embeddings |
Published | 2018-08-14 |
URL | http://arxiv.org/abs/1808.04891v1 |
http://arxiv.org/pdf/1808.04891v1.pdf | |
PWC | https://paperswithcode.com/paper/embedding-grammars |
Repo | |
Framework | |
Deep RBFNet: Point Cloud Feature Learning using Radial Basis Functions
Title | Deep RBFNet: Point Cloud Feature Learning using Radial Basis Functions |
Authors | Weikai Chen, Xiaoguang Han, Guanbin Li, Chao Chen, Jun Xing, Yajie Zhao, Hao Li |
Abstract | Three-dimensional object recognition has recently achieved great progress thanks to the development of effective point cloud-based learning frameworks, such as PointNet and its extensions. However, existing methods rely heavily on fully connected layers, which introduce a significant amount of parameters, making the network harder to train and prone to overfitting problems. In this paper, we propose a simple yet effective framework for point set feature learning by leveraging a nonlinear activation layer encoded by Radial Basis Function (RBF) kernels. Unlike PointNet variants, that fail to recognize local point patterns, our approach explicitly models the spatial distribution of point clouds by aggregating features from sparsely distributed RBF kernels. A typical RBF kernel, e.g. Gaussian function, naturally penalizes long-distance response and is only activated by neighboring points. Such localized response generates highly discriminative features given different point distributions. In addition, our framework allows the joint optimization of kernel distribution and its receptive field, automatically evolving kernel configurations in an end-to-end manner. We demonstrate that the proposed network with a single RBF layer can outperform the state-of-the-art Pointnet++ in terms of classification accuracy for 3D object recognition tasks. Moreover, the introduction of nonlinear mappings significantly reduces the number of network parameters and computational cost, enabling significantly faster training and a deployable point cloud recognition solution on portable devices with limited resources. |
Tasks | 3D Object Recognition, Object Recognition |
Published | 2018-12-11 |
URL | http://arxiv.org/abs/1812.04302v2 |
http://arxiv.org/pdf/1812.04302v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-rbfnet-point-cloud-feature-learning |
Repo | |
Framework | |
Improving Gated Recurrent Unit Based Acoustic Modeling with Batch Normalization and Enlarged Context
Title | Improving Gated Recurrent Unit Based Acoustic Modeling with Batch Normalization and Enlarged Context |
Authors | Jie Li, Yahui Shan, Xiaorui Wang, Yan Li |
Abstract | The use of future contextual information is typically shown to be helpful for acoustic modeling. Recently, we proposed a RNN model called minimal gated recurrent unit with input projection (mGRUIP), in which a context module namely temporal convolution, is specifically designed to model the future context. This model, mGRUIP with context module (mGRUIP-Ctx), has been shown to be able of utilizing the future context effectively, meanwhile with quite low model latency and computation cost. In this paper, we continue to improve mGRUIP-Ctx with two revisions: applying BN methods and enlarging model context. Experimental results on two Mandarin ASR tasks (8400 hours and 60K hours) show that, the revised mGRUIP-Ctx outperform LSTM with a large margin (11% to 38%). It even performs slightly better than a superior BLSTM on the 8400h task, with 33M less parameters and just 290ms model latency. |
Tasks | |
Published | 2018-11-26 |
URL | http://arxiv.org/abs/1811.10169v1 |
http://arxiv.org/pdf/1811.10169v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-gated-recurrent-unit-based-acoustic |
Repo | |
Framework | |