Paper Group AWR 330
Arabic Text Diacritization Using Deep Neural Networks. Pose Estimation of Periacetabular Osteotomy Fragments with Intraoperative X-Ray Navigation. Topological Machine Learning for Multivariate Time Series. Uncovering the Semantics of Wikipedia Categories. Estimating Information-Theoretic Quantities with Uncertainty Forests. A Prior of a Googol Gaus …
Arabic Text Diacritization Using Deep Neural Networks
Title | Arabic Text Diacritization Using Deep Neural Networks |
Authors | Ali Fadel, Ibraheem Tuffaha, Bara’ Al-Jawarneh, Mahmoud Al-Ayyoub |
Abstract | Diacritization of Arabic text is both an interesting and a challenging problem at the same time with various applications ranging from speech synthesis to helping students learning the Arabic language. Like many other tasks or problems in Arabic language processing, the weak efforts invested into this problem and the lack of available (open-source) resources hinder the progress towards solving this problem. This work provides a critical review for the currently existing systems, measures and resources for Arabic text diacritization. Moreover, it introduces a much-needed free-for-all cleaned dataset that can be easily used to benchmark any work on Arabic diacritization. Extracted from the Tashkeela Corpus, the dataset consists of 55K lines containing about 2.3M words. After constructing the dataset, existing tools and systems are tested on it. The results of the experiments show that the neural Shakkala system significantly outperforms traditional rule-based approaches and other closed-source tools with a Diacritic Error Rate (DER) of 2.88% compared with 13.78%, which the best DER for the non-neural approach (obtained by the Mishkal tool). |
Tasks | Arabic Text Diacritization |
Published | 2019-04-25 |
URL | http://arxiv.org/abs/1905.01965v1 |
http://arxiv.org/pdf/1905.01965v1.pdf | |
PWC | https://paperswithcode.com/paper/190501965 |
Repo | https://github.com/Barqawiz/Shakkala |
Framework | tf |
Pose Estimation of Periacetabular Osteotomy Fragments with Intraoperative X-Ray Navigation
Title | Pose Estimation of Periacetabular Osteotomy Fragments with Intraoperative X-Ray Navigation |
Authors | Robert B. Grupp, Rachel A. Hegeman, Ryan J. Murphy, Clayton P. Alexander, Yoshito Otake, Benjamin A. McArthur, Mehran Armand, Russell H. Taylor |
Abstract | Objective: State of the art navigation systems for pelvic osteotomies use optical systems with external fiducials. We propose the use of X-Ray navigation for pose estimation of periacetabular fragments without fiducials. Methods: A 2D/3D registration pipeline was developed to recover fragment pose. This pipeline was tested through an extensive simulation study and 6 cadaveric surgeries. Using osteotomy boundaries in the fluoroscopic images, the preoperative plan is refined to more accurately match the intraoperative shape. Results: In simulation, average fragment pose errors were 1.3{\deg}/1.7 mm when the planned fragment matched the intraoperative fragment, 2.2{\deg}/2.1 mm when the plan was not updated to match the true shape, and 1.9{\deg}/2.0 mm when the fragment shape was intraoperatively estimated. In cadaver experiments, the average pose errors were 2.2{\deg}/2.2 mm, 3.8{\deg}/2.5 mm, and 3.5{\deg}/2.2 mm when registering with the actual fragment shape, a preoperative plan, and an intraoperatively refined plan, respectively. Average errors of the lateral center edge angle were less than 2{\deg} for all fragment shapes in simulation and cadaver experiments. Conclusion: The proposed pipeline is capable of accurately reporting femoral head coverage within a range clinically identified for long-term joint survivability. Significance: Human interpretation of fragment pose is challenging and usually restricted to rotation about a single anatomical axis. The proposed pipeline provides an intraoperative estimate of rigid pose with respect to all anatomical axes, is compatible with minimally invasive incisions, and has no dependence on external fiducials. |
Tasks | Pose Estimation |
Published | 2019-03-22 |
URL | https://arxiv.org/abs/1903.09339v2 |
https://arxiv.org/pdf/1903.09339v2.pdf | |
PWC | https://paperswithcode.com/paper/pose-estimation-of-periacetabular-osteotomy |
Repo | https://github.com/rg2/DeepFluoroLabeling-IPCAI2020 |
Framework | pytorch |
Topological Machine Learning for Multivariate Time Series
Title | Topological Machine Learning for Multivariate Time Series |
Authors | Chengyuan Wu, Carol Anne Hargreaves |
Abstract | We develop a framework for analyzing multivariate time series using topological data analysis (TDA) methods. The proposed methodology involves converting the multivariate time series to point cloud data, calculating Wasserstein distances between the persistence diagrams and using the $k$-nearest neighbors algorithm ($k$-NN) for supervised machine learning. Two methods (symmetry-breaking and anchor points) are also introduced to enable TDA to better analyze data with heterogeneous features that are sensitive to translation, rotation, or choice of coordinates. We apply our methods to room occupancy detection based on 5 time-dependent variables (temperature, humidity, light, CO2 and humidity ratio). Experimental results show that topological methods are effective in predicting room occupancy during a time window. |
Tasks | Time Series, Topological Data Analysis |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12082v1 |
https://arxiv.org/pdf/1911.12082v1.pdf | |
PWC | https://paperswithcode.com/paper/topological-machine-learning-for-multivariate |
Repo | https://github.com/wuchengyuan88/room-occupancy-topology |
Framework | none |
Uncovering the Semantics of Wikipedia Categories
Title | Uncovering the Semantics of Wikipedia Categories |
Authors | Nicolas Heist, Heiko Paulheim |
Abstract | The Wikipedia category graph serves as the taxonomic backbone for large-scale knowledge graphs like YAGO or Probase, and has been used extensively for tasks like entity disambiguation or semantic similarity estimation. Wikipedia’s categories are a rich source of taxonomic as well as non-taxonomic information. The category ‘German science fiction writers’, for example, encodes the type of its resources (Writer), as well as their nationality (German) and genre (Science Fiction). Several approaches in the literature make use of fractions of this encoded information without exploiting its full potential. In this paper, we introduce an approach for the discovery of category axioms that uses information from the category network, category instances, and their lexicalisations. With DBpedia as background knowledge, we discover 703k axioms covering 502k of Wikipedia’s categories and populate the DBpedia knowledge graph with additional 4.4M relation assertions and 3.3M type assertions at more than 87% and 90% precision, respectively. |
Tasks | Entity Disambiguation, Knowledge Graphs, Semantic Similarity, Semantic Textual Similarity |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1906.12089v1 |
https://arxiv.org/pdf/1906.12089v1.pdf | |
PWC | https://paperswithcode.com/paper/uncovering-the-semantics-of-wikipedia |
Repo | https://github.com/nheist/Cat2Ax |
Framework | none |
Estimating Information-Theoretic Quantities with Uncertainty Forests
Title | Estimating Information-Theoretic Quantities with Uncertainty Forests |
Authors | Richard Guo, Ronak Mehta, Jesus Arroyo, Hayden Helm, Cencheng Shen, Joshua T. Vogelstein |
Abstract | Information-theoretic quantities, such as mutual information and conditional entropy, are useful statistics for measuring the dependence between two random variables. However, estimating these quantities in a non-parametric fashion is difficult, especially when the variables are high-dimensional, a mixture of continuous and discrete values, or both. In this paper, we propose a decision forest method, Conditional Forests (CF), to estimate these quantities. By combining quantile regression forests with honest sampling, and introducing a finite sample correction, CF improves finite sample bias in a range of settings. We demonstrate through simulations that CF achieves smaller bias and variance in both low- and high-dimensional settings for estimating posteriors, conditional entropy, and mutual information. We then use CF to estimate the amount of information between neuron class and other ceulluar feautres. |
Tasks | |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00325v3 |
https://arxiv.org/pdf/1907.00325v3.pdf | |
PWC | https://paperswithcode.com/paper/estimating-information-theoretic-quantities |
Repo | https://github.com/neurodata/uncertainty-forest |
Framework | none |
A Prior of a Googol Gaussians: a Tensor Ring Induced Prior for Generative Models
Title | A Prior of a Googol Gaussians: a Tensor Ring Induced Prior for Generative Models |
Authors | Maksim Kuznetsov, Daniil Polykovskiy, Dmitry Vetrov, Alexander Zhebrak |
Abstract | Generative models produce realistic objects in many domains, including text, image, video, and audio synthesis. Most popular models—Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs)—usually employ a standard Gaussian distribution as a prior. Previous works show that the richer family of prior distributions may help to avoid the mode collapse problem in GANs and to improve the evidence lower bound in VAEs. We propose a new family of prior distributions—Tensor Ring Induced Prior (TRIP)—that packs an exponential number of Gaussians into a high-dimensional lattice with a relatively small number of parameters. We show that these priors improve Fr'echet Inception Distance for GANs and Evidence Lower Bound for VAEs. We also study generative models with TRIP in the conditional generation setup with missing conditions. Altogether, we propose a novel plug-and-play framework for generative models that can be utilized in any GAN and VAE-like architectures. |
Tasks | |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13148v1 |
https://arxiv.org/pdf/1910.13148v1.pdf | |
PWC | https://paperswithcode.com/paper/a-prior-of-a-googol-gaussians-a-tensor-ring |
Repo | https://github.com/insilicomedicine/TRIP |
Framework | pytorch |
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Title | ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language |
Authors | Dave Zhenyu Chen, Angel X. Chang, Matthias Nießner |
Abstract | We introduce the new task of 3D object localization in RGB-D scans using natural language descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free-form description of a specified target object. To address this task, we propose ScanRefer, where the core idea is to learn a fused descriptor from 3D object proposals and encoded sentence embeddings. This learned descriptor then correlates the language expressions with the underlying geometric features of the 3D scan and facilitates the regression of the 3D bounding box of the target object. In order to train and benchmark our method, we introduce a new ScanRefer dataset, containing 46,173 descriptions of 9,943 objects from 703 ScanNet scenes. ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D. |
Tasks | Object Localization, Sentence Embeddings |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08830v1 |
https://arxiv.org/pdf/1912.08830v1.pdf | |
PWC | https://paperswithcode.com/paper/scanrefer-3d-object-localization-in-rgb-d |
Repo | https://github.com/daveredrum/ScanRefer |
Framework | pytorch |
Removing input features via a generative model to explain their attributions to an image classifier’s decisions
Title | Removing input features via a generative model to explain their attributions to an image classifier’s decisions |
Authors | Chirag Agarwal, Dan Schonfeld, Anh Nguyen |
Abstract | Interpretability methods often measure the contribution of an input feature to an image classifier’s decisions by heuristically removing it via e.g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples. Instead, we propose to integrate a generative inpainter into three representative attribution methods to remove an input feature. Compared to the original counterparts, our methods (1) generate more plausible counterfactual samples under the true data generating process; (2) are more robust to hyperparameter changes; and (3) are more accurate according to three metrics: object localization, deletion and saliency metrics. Our findings were consistent across both ImageNet and Places365 datasets and two different pairs of classifiers and inpainters. |
Tasks | Object Localization |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.04256v3 |
https://arxiv.org/pdf/1910.04256v3.pdf | |
PWC | https://paperswithcode.com/paper/removing-input-features-via-a-generative-1 |
Repo | https://github.com/anguyen8/generative-attribution-methods |
Framework | pytorch |
Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models
Title | Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models |
Authors | Daniel Omeiza, Skyler Speakman, Celia Cintas, Komminist Weldermariam |
Abstract | Gaining insight into how deep convolutional neural network models perform image classification and how to explain their outputs have been a concern to computer vision researchers and decision makers. These deep models are often referred to as black box due to low comprehension of their internal workings. As an effort to developing explainable deep learning models, several methods have been proposed such as finding gradients of class output with respect to input image (sensitivity maps), class activation map (CAM), and Gradient based Class Activation Maps (Grad-CAM). These methods under perform when localizing multiple occurrences of the same class and do not work for all CNNs. In addition, Grad-CAM does not capture the entire object in completeness when used on single object images, this affect performance on recognition tasks. With the intention to create an enhanced visual explanation in terms of visual sharpness, object localization and explaining multiple occurrences of objects in a single image, we present Smooth Grad-CAM++ \footnote{Simple demo: http://35.238.22.135:5000/}, a technique that combines methods from two other recent techniques—SMOOTHGRAD and Grad-CAM++. Our Smooth Grad-CAM++ technique provides the capability of either visualizing a layer, subset of feature maps, or subset of neurons within a feature map at each instance at the inference level (model prediction process). After experimenting with few images, Smooth Grad-CAM++ produced more visually sharp maps with better localization of objects in the given input images when compared with other methods. |
Tasks | Image Classification, Object Localization |
Published | 2019-08-03 |
URL | https://arxiv.org/abs/1908.01224v1 |
https://arxiv.org/pdf/1908.01224v1.pdf | |
PWC | https://paperswithcode.com/paper/smooth-grad-cam-an-enhanced-inference-level |
Repo | https://github.com/yiskw713/SmoothGradCAMplusplus |
Framework | pytorch |
Min-max Entropy for Weakly Supervised Pointwise Localization
Title | Min-max Entropy for Weakly Supervised Pointwise Localization |
Authors | Soufiane Belharbi, Jérôme Rony, Jose Dolz, Ismail Ben Ayed, Luke McCaffrey, Eric Granger |
Abstract | Pointwise localization allows more precise localization and accurate interpretability, compared to bounding box, in applications where objects are highly unstructured such as in medical domain. In this work, we focus on weakly supervised localization (WSL) where a model is trained to classify an image and localize regions of interest at pixel-level using only global image annotation. Typical convolutional attentions maps are prune to high false positive regions. To alleviate this issue, we propose a new deep learning method for WSL, composed of a localizer and a classifier, where the localizer is constrained to determine relevant and irrelevant regions using conditional entropy (CE) with the aim to reduce false positive regions. Experimental results on a public medical dataset and two natural datasets, using Dice index, show that, compared to state of the art WSL methods, our proposal can provide significant improvements in terms of image-level classification and pixel-level localization (low false positive) with robustness to overfitting. A public reproducible PyTorch implementation is provided in: https://github.com/sbelharbi/wsol-min-max-entropy-interpretability . |
Tasks | Object Localization, Weakly-Supervised Object Localization |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.12934v4 |
https://arxiv.org/pdf/1907.12934v4.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-object-localization-using-3 |
Repo | https://github.com/sbelharbi/wsol-min-max-entropy-interpretability |
Framework | pytorch |
RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification
Title | RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification |
Authors | Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu |
Abstract | Recently, direct modeling of raw waveforms using deep neural networks has been widely studied for a number of tasks in audio domains. In speaker verification, however, utilization of raw waveforms is in its preliminary phase, requiring further investigation. In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification. Adjustment of model architecture using a pre-training scheme can extract speaker embeddings, giving a significant improvement in performance. Additional objective functions simplify the process of extracting speaker embeddings by merging conventional two-phase processes: extracting utterance-level features such as i-vectors or x-vectors and the feature enhancement phase, e.g., linear discriminant analysis. Effective back-end classification models that suit the proposed speaker embedding are also explored. We propose an end-to-end system that comprises two deep neural networks, one front-end for utterance-level speaker embedding extraction and the other for back-end classification. Experiments conducted on the VoxCeleb1 dataset demonstrate that the proposed model achieves state-of-the-art performance among systems without data augmentation. The proposed system is also comparable to the state-of-the-art x-vector system that adopts data augmentation. |
Tasks | Data Augmentation, Speaker Verification, Text-Independent Speaker Verification |
Published | 2019-04-17 |
URL | https://arxiv.org/abs/1904.08104v2 |
https://arxiv.org/pdf/1904.08104v2.pdf | |
PWC | https://paperswithcode.com/paper/rawnet-advanced-end-to-end-deep-neural |
Repo | https://github.com/Jungjee/RawNet |
Framework | tf |
Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation
Title | Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation |
Authors | Giorgos Bouritsas, Sergiy Bokhnyak, Stylianos Ploumpis, Michael Bronstein, Stefanos Zafeiriou |
Abstract | Generative models for 3D geometric data arise in many important applications in 3D computer vision and graphics. In this paper, we focus on 3D deformable shapes that share a common topological structure, such as human faces and bodies. Morphable Models and their variants, despite their linear formulation, have been widely used for shape representation, while most of the recently proposed nonlinear approaches resort to intermediate representations, such as 3D voxel grids or 2D views. In this work, we introduce a novel graph convolutional operator, acting directly on the 3D mesh, that explicitly models the inductive bias of the fixed underlying graph. This is achieved by enforcing consistent local orderings of the vertices of the graph, through the spiral operator, thus breaking the permutation invariance property that is adopted by all the prior work on Graph Neural Networks. Our operator comes by construction with desirable properties (anisotropic, topology-aware, lightweight, easy-to-optimise), and by using it as a building block for traditional deep generative architectures, we demonstrate state-of-the-art results on a variety of 3D shape datasets compared to the linear Morphable Model and other graph convolutional operators. |
Tasks | 3D Shape Representation, Representation Learning |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.02876v3 |
https://arxiv.org/pdf/1905.02876v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-3d-morphable-models-spiral |
Repo | https://github.com/gbouritsas/Neural3DMM |
Framework | pytorch |
Joint Multi-frame Detection and Segmentation for Multi-cell Tracking
Title | Joint Multi-frame Detection and Segmentation for Multi-cell Tracking |
Authors | Zibin Zhou, Fei Wang, Wenjuan Xi, Huaying Chen, Peng Gao, Chengkang He |
Abstract | Tracking living cells in video sequence is difficult, because of cell morphology and high similarities between cells. Tracking-by-detection methods are widely used in multi-cell tracking. We perform multi-cell tracking based on the cell centroid detection, and the performance of the detector has high impact on tracking performance. In this paper, UNet is utilized to extract inter-frame and intra-frame spatio-temporal information of cells. Detection performance of cells in mitotic phase is improved by multi-frame input. Good detection results facilitate multi-cell tracking. A mitosis detection algorithm is proposed to detect cell mitosis and the cell lineage is built up. Another UNet is utilized to acquire primary segmentation. Jointly using detection and primary segmentation, cells can be fine segmented in highly dense cell population. Experiments are conducted to evaluate the effectiveness of our method, and results show its state-of-the-art performance. |
Tasks | Mitosis Detection |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.10886v1 |
https://arxiv.org/pdf/1906.10886v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-multi-frame-detection-and-segmentation |
Repo | https://github.com/zhousam/Joint-Multi-frame-Detection-and-Segmentation-for-Multi-cell-Tracking |
Framework | none |
A Repository of Conversational Datasets
Title | A Repository of Conversational Datasets |
Authors | Matthew Henderson, Paweł Budzianowski, Iñigo Casanueva, Sam Coope, Daniela Gerz, Girish Kumar, Nikola Mrkšić, Georgios Spithourakis, Pei-Hao Su, Ivan Vulić, Tsung-Hsien Wen |
Abstract | Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using ‘1-of-100 accuracy’. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set. |
Tasks | Conversational Response Selection, Dialogue Understanding |
Published | 2019-04-13 |
URL | https://arxiv.org/abs/1904.06472v2 |
https://arxiv.org/pdf/1904.06472v2.pdf | |
PWC | https://paperswithcode.com/paper/190406472 |
Repo | https://github.com/qinguangjun/conversational-datasets |
Framework | tf |
JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation
Title | JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation |
Authors | Rajas Agashe, Srinivasan Iyer, Luke Zettlemoyer |
Abstract | Interactive programming with interleaved code snippet cells and natural language markdown is recently gaining popularity in the form of Jupyter notebooks, which accelerate prototyping and collaboration. To study code generation conditioned on a long context history, we present JuICe, a corpus of 1.5 million examples with a curated test set of 3.7K instances based on online programming assignments. Compared with existing contextual code generation datasets, JuICe provides refined human-curated data, open-domain code, and an order of magnitude more training data. Using JuICe, we train models for two tasks: (1) generation of the API call sequence in a code cell, and (2) full code cell generation, both conditioned on the NL-Code history up to a particular code cell. Experiments using current baseline code generation models show that both context and distant supervision aid in generation, and that the dataset is challenging for current systems. |
Tasks | Code Generation |
Published | 2019-10-05 |
URL | https://arxiv.org/abs/1910.02216v2 |
https://arxiv.org/pdf/1910.02216v2.pdf | |
PWC | https://paperswithcode.com/paper/juice-a-large-scale-distantly-supervised |
Repo | https://github.com/rajasagashe/juice |
Framework | none |