February 1, 2020

3133 words 15 mins read

Paper Group AWR 330

Arabic Text Diacritization Using Deep Neural Networks. Pose Estimation of Periacetabular Osteotomy Fragments with Intraoperative X-Ray Navigation. Topological Machine Learning for Multivariate Time Series. Uncovering the Semantics of Wikipedia Categories. Estimating Information-Theoretic Quantities with Uncertainty Forests. A Prior of a Googol Gaus …

Arabic Text Diacritization Using Deep Neural Networks


Title	Arabic Text Diacritization Using Deep Neural Networks
Authors	Ali Fadel, Ibraheem Tuffaha, Bara’ Al-Jawarneh, Mahmoud Al-Ayyoub
Abstract	Diacritization of Arabic text is both an interesting and a challenging problem at the same time with various applications ranging from speech synthesis to helping students learning the Arabic language. Like many other tasks or problems in Arabic language processing, the weak efforts invested into this problem and the lack of available (open-source) resources hinder the progress towards solving this problem. This work provides a critical review for the currently existing systems, measures and resources for Arabic text diacritization. Moreover, it introduces a much-needed free-for-all cleaned dataset that can be easily used to benchmark any work on Arabic diacritization. Extracted from the Tashkeela Corpus, the dataset consists of 55K lines containing about 2.3M words. After constructing the dataset, existing tools and systems are tested on it. The results of the experiments show that the neural Shakkala system significantly outperforms traditional rule-based approaches and other closed-source tools with a Diacritic Error Rate (DER) of 2.88% compared with 13.78%, which the best DER for the non-neural approach (obtained by the Mishkal tool).
Tasks	Arabic Text Diacritization
Published	2019-04-25
URL	http://arxiv.org/abs/1905.01965v1
PDF	http://arxiv.org/pdf/1905.01965v1.pdf
PWC	https://paperswithcode.com/paper/190501965
Repo	https://github.com/Barqawiz/Shakkala
Framework	tf


Title	Pose Estimation of Periacetabular Osteotomy Fragments with Intraoperative X-Ray Navigation
Authors	Robert B. Grupp, Rachel A. Hegeman, Ryan J. Murphy, Clayton P. Alexander, Yoshito Otake, Benjamin A. McArthur, Mehran Armand, Russell H. Taylor
Abstract	Objective: State of the art navigation systems for pelvic osteotomies use optical systems with external fiducials. We propose the use of X-Ray navigation for pose estimation of periacetabular fragments without fiducials. Methods: A 2D/3D registration pipeline was developed to recover fragment pose. This pipeline was tested through an extensive simulation study and 6 cadaveric surgeries. Using osteotomy boundaries in the fluoroscopic images, the preoperative plan is refined to more accurately match the intraoperative shape. Results: In simulation, average fragment pose errors were 1.3{\deg}/1.7 mm when the planned fragment matched the intraoperative fragment, 2.2{\deg}/2.1 mm when the plan was not updated to match the true shape, and 1.9{\deg}/2.0 mm when the fragment shape was intraoperatively estimated. In cadaver experiments, the average pose errors were 2.2{\deg}/2.2 mm, 3.8{\deg}/2.5 mm, and 3.5{\deg}/2.2 mm when registering with the actual fragment shape, a preoperative plan, and an intraoperatively refined plan, respectively. Average errors of the lateral center edge angle were less than 2{\deg} for all fragment shapes in simulation and cadaver experiments. Conclusion: The proposed pipeline is capable of accurately reporting femoral head coverage within a range clinically identified for long-term joint survivability. Significance: Human interpretation of fragment pose is challenging and usually restricted to rotation about a single anatomical axis. The proposed pipeline provides an intraoperative estimate of rigid pose with respect to all anatomical axes, is compatible with minimally invasive incisions, and has no dependence on external fiducials.
Tasks	Pose Estimation
Published	2019-03-22
URL	https://arxiv.org/abs/1903.09339v2
PDF	https://arxiv.org/pdf/1903.09339v2.pdf
PWC	https://paperswithcode.com/paper/pose-estimation-of-periacetabular-osteotomy
Repo	https://github.com/rg2/DeepFluoroLabeling-IPCAI2020
Framework	pytorch

Topological Machine Learning for Multivariate Time Series


Title	Topological Machine Learning for Multivariate Time Series
Authors	Chengyuan Wu, Carol Anne Hargreaves
Abstract	We develop a framework for analyzing multivariate time series using topological data analysis (TDA) methods. The proposed methodology involves converting the multivariate time series to point cloud data, calculating Wasserstein distances between the persistence diagrams and using the $k$-nearest neighbors algorithm ($k$-NN) for supervised machine learning. Two methods (symmetry-breaking and anchor points) are also introduced to enable TDA to better analyze data with heterogeneous features that are sensitive to translation, rotation, or choice of coordinates. We apply our methods to room occupancy detection based on 5 time-dependent variables (temperature, humidity, light, CO2 and humidity ratio). Experimental results show that topological methods are effective in predicting room occupancy during a time window.
Tasks	Time Series, Topological Data Analysis
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12082v1
PDF	https://arxiv.org/pdf/1911.12082v1.pdf
PWC	https://paperswithcode.com/paper/topological-machine-learning-for-multivariate
Repo	https://github.com/wuchengyuan88/room-occupancy-topology
Framework	none

Uncovering the Semantics of Wikipedia Categories


Title	Uncovering the Semantics of Wikipedia Categories
Authors	Nicolas Heist, Heiko Paulheim
Abstract	The Wikipedia category graph serves as the taxonomic backbone for large-scale knowledge graphs like YAGO or Probase, and has been used extensively for tasks like entity disambiguation or semantic similarity estimation. Wikipedia’s categories are a rich source of taxonomic as well as non-taxonomic information. The category ‘German science fiction writers’, for example, encodes the type of its resources (Writer), as well as their nationality (German) and genre (Science Fiction). Several approaches in the literature make use of fractions of this encoded information without exploiting its full potential. In this paper, we introduce an approach for the discovery of category axioms that uses information from the category network, category instances, and their lexicalisations. With DBpedia as background knowledge, we discover 703k axioms covering 502k of Wikipedia’s categories and populate the DBpedia knowledge graph with additional 4.4M relation assertions and 3.3M type assertions at more than 87% and 90% precision, respectively.
Tasks	Entity Disambiguation, Knowledge Graphs, Semantic Similarity, Semantic Textual Similarity
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12089v1
PDF	https://arxiv.org/pdf/1906.12089v1.pdf
PWC	https://paperswithcode.com/paper/uncovering-the-semantics-of-wikipedia
Repo	https://github.com/nheist/Cat2Ax
Framework	none

Estimating Information-Theoretic Quantities with Uncertainty Forests


Title	Estimating Information-Theoretic Quantities with Uncertainty Forests
Authors	Richard Guo, Ronak Mehta, Jesus Arroyo, Hayden Helm, Cencheng Shen, Joshua T. Vogelstein
Abstract	Information-theoretic quantities, such as mutual information and conditional entropy, are useful statistics for measuring the dependence between two random variables. However, estimating these quantities in a non-parametric fashion is difficult, especially when the variables are high-dimensional, a mixture of continuous and discrete values, or both. In this paper, we propose a decision forest method, Conditional Forests (CF), to estimate these quantities. By combining quantile regression forests with honest sampling, and introducing a finite sample correction, CF improves finite sample bias in a range of settings. We demonstrate through simulations that CF achieves smaller bias and variance in both low- and high-dimensional settings for estimating posteriors, conditional entropy, and mutual information. We then use CF to estimate the amount of information between neuron class and other ceulluar feautres.
Tasks
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00325v3
PDF	https://arxiv.org/pdf/1907.00325v3.pdf
PWC	https://paperswithcode.com/paper/estimating-information-theoretic-quantities
Repo	https://github.com/neurodata/uncertainty-forest
Framework	none

A Prior of a Googol Gaussians: a Tensor Ring Induced Prior for Generative Models


Title	A Prior of a Googol Gaussians: a Tensor Ring Induced Prior for Generative Models
Authors	Maksim Kuznetsov, Daniil Polykovskiy, Dmitry Vetrov, Alexander Zhebrak
Abstract	Generative models produce realistic objects in many domains, including text, image, video, and audio synthesis. Most popular models—Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs)—usually employ a standard Gaussian distribution as a prior. Previous works show that the richer family of prior distributions may help to avoid the mode collapse problem in GANs and to improve the evidence lower bound in VAEs. We propose a new family of prior distributions—Tensor Ring Induced Prior (TRIP)—that packs an exponential number of Gaussians into a high-dimensional lattice with a relatively small number of parameters. We show that these priors improve Fr'echet Inception Distance for GANs and Evidence Lower Bound for VAEs. We also study generative models with TRIP in the conditional generation setup with missing conditions. Altogether, we propose a novel plug-and-play framework for generative models that can be utilized in any GAN and VAE-like architectures.
Tasks
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13148v1
PDF	https://arxiv.org/pdf/1910.13148v1.pdf
PWC	https://paperswithcode.com/paper/a-prior-of-a-googol-gaussians-a-tensor-ring
Repo	https://github.com/insilicomedicine/TRIP
Framework	pytorch

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language


Title	ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Authors	Dave Zhenyu Chen, Angel X. Chang, Matthias Nießner
Abstract	We introduce the new task of 3D object localization in RGB-D scans using natural language descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free-form description of a specified target object. To address this task, we propose ScanRefer, where the core idea is to learn a fused descriptor from 3D object proposals and encoded sentence embeddings. This learned descriptor then correlates the language expressions with the underlying geometric features of the 3D scan and facilitates the regression of the 3D bounding box of the target object. In order to train and benchmark our method, we introduce a new ScanRefer dataset, containing 46,173 descriptions of 9,943 objects from 703 ScanNet scenes. ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D.
Tasks	Object Localization, Sentence Embeddings
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08830v1
PDF	https://arxiv.org/pdf/1912.08830v1.pdf
PWC	https://paperswithcode.com/paper/scanrefer-3d-object-localization-in-rgb-d
Repo	https://github.com/daveredrum/ScanRefer
Framework	pytorch

Removing input features via a generative model to explain their attributions to an image classifier’s decisions


Title	Removing input features via a generative model to explain their attributions to an image classifier’s decisions
Authors	Chirag Agarwal, Dan Schonfeld, Anh Nguyen
Abstract	Interpretability methods often measure the contribution of an input feature to an image classifier’s decisions by heuristically removing it via e.g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples. Instead, we propose to integrate a generative inpainter into three representative attribution methods to remove an input feature. Compared to the original counterparts, our methods (1) generate more plausible counterfactual samples under the true data generating process; (2) are more robust to hyperparameter changes; and (3) are more accurate according to three metrics: object localization, deletion and saliency metrics. Our findings were consistent across both ImageNet and Places365 datasets and two different pairs of classifiers and inpainters.
Tasks	Object Localization
Published	2019-10-09
URL	https://arxiv.org/abs/1910.04256v3
PDF	https://arxiv.org/pdf/1910.04256v3.pdf
PWC	https://paperswithcode.com/paper/removing-input-features-via-a-generative-1
Repo	https://github.com/anguyen8/generative-attribution-methods
Framework	pytorch

Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models


Title	Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models
Authors	Daniel Omeiza, Skyler Speakman, Celia Cintas, Komminist Weldermariam
Abstract	Gaining insight into how deep convolutional neural network models perform image classification and how to explain their outputs have been a concern to computer vision researchers and decision makers. These deep models are often referred to as black box due to low comprehension of their internal workings. As an effort to developing explainable deep learning models, several methods have been proposed such as finding gradients of class output with respect to input image (sensitivity maps), class activation map (CAM), and Gradient based Class Activation Maps (Grad-CAM). These methods under perform when localizing multiple occurrences of the same class and do not work for all CNNs. In addition, Grad-CAM does not capture the entire object in completeness when used on single object images, this affect performance on recognition tasks. With the intention to create an enhanced visual explanation in terms of visual sharpness, object localization and explaining multiple occurrences of objects in a single image, we present Smooth Grad-CAM++ \footnote{Simple demo: http://35.238.22.135:5000/}, a technique that combines methods from two other recent techniques—SMOOTHGRAD and Grad-CAM++. Our Smooth Grad-CAM++ technique provides the capability of either visualizing a layer, subset of feature maps, or subset of neurons within a feature map at each instance at the inference level (model prediction process). After experimenting with few images, Smooth Grad-CAM++ produced more visually sharp maps with better localization of objects in the given input images when compared with other methods.
Tasks	Image Classification, Object Localization
Published	2019-08-03
URL	https://arxiv.org/abs/1908.01224v1
PDF	https://arxiv.org/pdf/1908.01224v1.pdf
PWC	https://paperswithcode.com/paper/smooth-grad-cam-an-enhanced-inference-level
Repo	https://github.com/yiskw713/SmoothGradCAMplusplus
Framework	pytorch

Min-max Entropy for Weakly Supervised Pointwise Localization


Title	Min-max Entropy for Weakly Supervised Pointwise Localization
Authors	Soufiane Belharbi, Jérôme Rony, Jose Dolz, Ismail Ben Ayed, Luke McCaffrey, Eric Granger
Abstract	Pointwise localization allows more precise localization and accurate interpretability, compared to bounding box, in applications where objects are highly unstructured such as in medical domain. In this work, we focus on weakly supervised localization (WSL) where a model is trained to classify an image and localize regions of interest at pixel-level using only global image annotation. Typical convolutional attentions maps are prune to high false positive regions. To alleviate this issue, we propose a new deep learning method for WSL, composed of a localizer and a classifier, where the localizer is constrained to determine relevant and irrelevant regions using conditional entropy (CE) with the aim to reduce false positive regions. Experimental results on a public medical dataset and two natural datasets, using Dice index, show that, compared to state of the art WSL methods, our proposal can provide significant improvements in terms of image-level classification and pixel-level localization (low false positive) with robustness to overfitting. A public reproducible PyTorch implementation is provided in: https://github.com/sbelharbi/wsol-min-max-entropy-interpretability .
Tasks	Object Localization, Weakly-Supervised Object Localization
Published	2019-07-25
URL	https://arxiv.org/abs/1907.12934v4
PDF	https://arxiv.org/pdf/1907.12934v4.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-object-localization-using-3
Repo	https://github.com/sbelharbi/wsol-min-max-entropy-interpretability
Framework	pytorch

RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification


Title	RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification
Authors	Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu
Abstract	Recently, direct modeling of raw waveforms using deep neural networks has been widely studied for a number of tasks in audio domains. In speaker verification, however, utilization of raw waveforms is in its preliminary phase, requiring further investigation. In this study, we explore end-to-end deep neural networks that input raw waveforms to improve various aspects: front-end speaker embedding extraction including model architecture, pre-training scheme, additional objective functions, and back-end classification. Adjustment of model architecture using a pre-training scheme can extract speaker embeddings, giving a significant improvement in performance. Additional objective functions simplify the process of extracting speaker embeddings by merging conventional two-phase processes: extracting utterance-level features such as i-vectors or x-vectors and the feature enhancement phase, e.g., linear discriminant analysis. Effective back-end classification models that suit the proposed speaker embedding are also explored. We propose an end-to-end system that comprises two deep neural networks, one front-end for utterance-level speaker embedding extraction and the other for back-end classification. Experiments conducted on the VoxCeleb1 dataset demonstrate that the proposed model achieves state-of-the-art performance among systems without data augmentation. The proposed system is also comparable to the state-of-the-art x-vector system that adopts data augmentation.
Tasks	Data Augmentation, Speaker Verification, Text-Independent Speaker Verification
Published	2019-04-17
URL	https://arxiv.org/abs/1904.08104v2
PDF	https://arxiv.org/pdf/1904.08104v2.pdf
PWC	https://paperswithcode.com/paper/rawnet-advanced-end-to-end-deep-neural
Repo	https://github.com/Jungjee/RawNet
Framework	tf

Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation


Title	Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation
Authors	Giorgos Bouritsas, Sergiy Bokhnyak, Stylianos Ploumpis, Michael Bronstein, Stefanos Zafeiriou
Abstract	Generative models for 3D geometric data arise in many important applications in 3D computer vision and graphics. In this paper, we focus on 3D deformable shapes that share a common topological structure, such as human faces and bodies. Morphable Models and their variants, despite their linear formulation, have been widely used for shape representation, while most of the recently proposed nonlinear approaches resort to intermediate representations, such as 3D voxel grids or 2D views. In this work, we introduce a novel graph convolutional operator, acting directly on the 3D mesh, that explicitly models the inductive bias of the fixed underlying graph. This is achieved by enforcing consistent local orderings of the vertices of the graph, through the spiral operator, thus breaking the permutation invariance property that is adopted by all the prior work on Graph Neural Networks. Our operator comes by construction with desirable properties (anisotropic, topology-aware, lightweight, easy-to-optimise), and by using it as a building block for traditional deep generative architectures, we demonstrate state-of-the-art results on a variety of 3D shape datasets compared to the linear Morphable Model and other graph convolutional operators.
Tasks	3D Shape Representation, Representation Learning
Published	2019-05-08
URL	https://arxiv.org/abs/1905.02876v3
PDF	https://arxiv.org/pdf/1905.02876v3.pdf
PWC	https://paperswithcode.com/paper/neural-3d-morphable-models-spiral
Repo	https://github.com/gbouritsas/Neural3DMM
Framework	pytorch

Joint Multi-frame Detection and Segmentation for Multi-cell Tracking


Title	Joint Multi-frame Detection and Segmentation for Multi-cell Tracking
Authors	Zibin Zhou, Fei Wang, Wenjuan Xi, Huaying Chen, Peng Gao, Chengkang He
Abstract	Tracking living cells in video sequence is difficult, because of cell morphology and high similarities between cells. Tracking-by-detection methods are widely used in multi-cell tracking. We perform multi-cell tracking based on the cell centroid detection, and the performance of the detector has high impact on tracking performance. In this paper, UNet is utilized to extract inter-frame and intra-frame spatio-temporal information of cells. Detection performance of cells in mitotic phase is improved by multi-frame input. Good detection results facilitate multi-cell tracking. A mitosis detection algorithm is proposed to detect cell mitosis and the cell lineage is built up. Another UNet is utilized to acquire primary segmentation. Jointly using detection and primary segmentation, cells can be fine segmented in highly dense cell population. Experiments are conducted to evaluate the effectiveness of our method, and results show its state-of-the-art performance.
Tasks	Mitosis Detection
Published	2019-06-26
URL	https://arxiv.org/abs/1906.10886v1
PDF	https://arxiv.org/pdf/1906.10886v1.pdf
PWC	https://paperswithcode.com/paper/joint-multi-frame-detection-and-segmentation
Repo	https://github.com/zhousam/Joint-Multi-frame-Detection-and-Segmentation-for-Multi-cell-Tracking
Framework	none

A Repository of Conversational Datasets


Title	A Repository of Conversational Datasets
Authors	Matthew Henderson, Paweł Budzianowski, Iñigo Casanueva, Sam Coope, Daniela Gerz, Girish Kumar, Nikola Mrkšić, Georgios Spithourakis, Pei-Hao Su, Ivan Vulić, Tsung-Hsien Wen
Abstract	Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using ‘1-of-100 accuracy’. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.
Tasks	Conversational Response Selection, Dialogue Understanding
Published	2019-04-13
URL	https://arxiv.org/abs/1904.06472v2
PDF	https://arxiv.org/pdf/1904.06472v2.pdf
PWC	https://paperswithcode.com/paper/190406472
Repo	https://github.com/qinguangjun/conversational-datasets
Framework	tf

JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation


Title	JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation
Authors	Rajas Agashe, Srinivasan Iyer, Luke Zettlemoyer
Abstract	Interactive programming with interleaved code snippet cells and natural language markdown is recently gaining popularity in the form of Jupyter notebooks, which accelerate prototyping and collaboration. To study code generation conditioned on a long context history, we present JuICe, a corpus of 1.5 million examples with a curated test set of 3.7K instances based on online programming assignments. Compared with existing contextual code generation datasets, JuICe provides refined human-curated data, open-domain code, and an order of magnitude more training data. Using JuICe, we train models for two tasks: (1) generation of the API call sequence in a code cell, and (2) full code cell generation, both conditioned on the NL-Code history up to a particular code cell. Experiments using current baseline code generation models show that both context and distant supervision aid in generation, and that the dataset is challenging for current systems.
Tasks	Code Generation
Published	2019-10-05
URL	https://arxiv.org/abs/1910.02216v2
PDF	https://arxiv.org/pdf/1910.02216v2.pdf
PWC	https://paperswithcode.com/paper/juice-a-large-scale-distantly-supervised
Repo	https://github.com/rajasagashe/juice
Framework	none