April 3, 2020

2899 words 14 mins read

Paper Group AWR 73

Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer!. Adaptive Propagation Graph Convolutional Network. Constant-Delay Enumeration for Nondeterministic Document Spanners. The Two-Pass Softmax Algorithm. Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting. Predict …

Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer!


Title	Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer!
Authors	Claudia Schulz, Damir Juric
Abstract	A large number of embeddings trained on medical data have emerged, but it remains unclear how well they represent medical terminology, in particular whether the close relationship of semantically similar medical terms is encoded in these embeddings. To date, only small datasets for testing medical term similarity are available, not allowing to draw conclusions about the generalisability of embeddings to the enormous amount of medical terms used by doctors. We present multiple automatically created large-scale medical term similarity datasets and confirm their high quality in an annotation study with doctors. We evaluate state-of-the-art word and contextual embeddings on our new datasets, comparing multiple vector similarity metrics and word vector aggregation techniques. Our results show that current embeddings are limited in their ability to adequately encode medical terms. The novel datasets thus form a challenging new benchmark for the development of medical embeddings able to accurately represent the whole medical terminology.
Tasks
Published	2020-03-24
URL	https://arxiv.org/abs/2003.11082v1
PDF	https://arxiv.org/pdf/2003.11082v1.pdf
PWC	https://paperswithcode.com/paper/can-embeddings-adequately-represent-medical
Repo	https://github.com/babylonhealth/medisim
Framework	none

Adaptive Propagation Graph Convolutional Network


Title	Adaptive Propagation Graph Convolutional Network
Authors	Indro Spinelli, Simone Scardapane, Aurelio Uncini
Abstract	Graph convolutional networks (GCNs) are a family of neural network models that perform inference on graph data by interleaving vertex-wise operations and message-passing exchanges across nodes. Concerning the latter, two key questions arise: (i) how to design a differentiable exchange protocol (e.g., a 1-hop Laplacian smoothing in the original GCN), and (ii) how to characterize the trade-off in complexity with respect to the local updates. In this paper, we show that state-of-the-art results can be achieved by adapting the number of communication steps independently at every node. In particular, we endow each node with a halting unit (inspired by Graves’ adaptive computation time) that after every exchange decides whether to continue communicating or not. We show that the proposed adaptive propagation GCN (AP-GCN) achieves superior or similar results to the best proposed models so far on a number of benchmarks, while requiring a small overhead in terms of additional parameters. We also investigate a regularization term to enforce an explicit trade-off between communication and accuracy. The code for the AP-GCN experiments is released as an open-source library.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10306v1
PDF	https://arxiv.org/pdf/2002.10306v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-propagation-graph-convolutional
Repo	https://github.com/spindro/AP-GCN
Framework	pytorch

Constant-Delay Enumeration for Nondeterministic Document Spanners


Title	Constant-Delay Enumeration for Nondeterministic Document Spanners
Authors	Antoine Amarilli, Pierre Bourhis, Stefan Mengel, Matthias Niewerth
Abstract	We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. Our goal is to have an algorithm which is tractable in combined complexity, i.e., in the sizes of the input document and the VA; while ensuring the best possible data complexity bounds in the input document size, i.e., constant delay in the document size. Several recent works at PODS’18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. In particular, Florenzano et al. suggest that our desired runtime guarantees cannot be met for general sequential VAs. We refute this and show that, given a nondeterministic sequential VA and an input document, we can enumerate the mappings of the VA on the document with the following bounds: the preprocessing is linear in the document size and polynomial in the size of the VA, and the delay is independent of the document and polynomial in the size of the VA. The resulting algorithm thus achieves tractability in combined complexity and the best possible data complexity bounds. Moreover, it is rather easy to describe, in particular for the restricted case of so-called extended VAs. Finally, we evaluate our algorithm empirically using a prototype implementation.
Tasks
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02576v1
PDF	https://arxiv.org/pdf/2003.02576v1.pdf
PWC	https://paperswithcode.com/paper/constant-delay-enumeration-for-1
Repo	https://github.com/PoDMR/enum-spanner-rs
Framework	none

The Two-Pass Softmax Algorithm


Title	The Two-Pass Softmax Algorithm
Authors	Marat Dukhan, Artsiom Ablavatski
Abstract	The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in three passes: the first pass to compute the normalization constant, and two other passes to compute outputs from normalized inputs. We analyze two variants of the Three-Pass algorithm and demonstrate that in a well-optimized implementation on HPC-class processors performance of all three passes is limited by memory bandwidth. We then present a novel algorithm for softmax computation in just two passes. The proposed Two-Pass algorithm avoids both numerical overflow and the extra normalization pass by employing an exotic representation for intermediate values, where each value is represented as a pair of floating-point numbers: one representing the “mantissa” and another representing the “exponent”. Performance evaluation demonstrates that on out-of-cache inputs on an Intel Skylake-X processor the new Two-Pass algorithm outperforms the traditional Three-Pass algorithm by up to 28% in AVX512 implementation, and by up to 18% in AVX2 implementation. The proposed Two-Pass algorithm also outperforms the traditional Three-Pass algorithm on Intel Broadwell and AMD Zen 2 processors. To foster reproducibility, we released an open-source implementation of the new Two-Pass Softmax algorithm and other experiments in this paper as a part of XNNPACK library at GitHub.com/google/XNNPACK.
Tasks
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04438v1
PDF	https://arxiv.org/pdf/2001.04438v1.pdf
PWC	https://paperswithcode.com/paper/the-two-pass-softmax-algorithm
Repo	https://github.com/google/XNNPACK
Framework	tf

Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting


Title	Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting
Authors	Qiquan Shi, Jiaming Yin, Jiajun Cai, Andrzej Cichocki, Tatsuya Yokota, Lei Chen, Mingxuan Yuan, Jia Zeng
Abstract	This work proposes a novel approach for multiple time series forecasting. At first, multi-way delay embedding transform (MDT) is employed to represent time series as low-rank block Hankel tensors (BHT). Then, the higher-order tensors are projected to compressed core tensors by applying Tucker decomposition. At the same time, the generalized tensor Autoregressive Integrated Moving Average (ARIMA) is explicitly used on consecutive core tensors to predict future samples. In this manner, the proposed approach tactically incorporates the unique advantages of MDT tensorization (to exploit mutual correlations) and tensor ARIMA coupled with low-rank Tucker decomposition into a unified framework. This framework exploits the low-rank structure of block Hankel tensors in the embedded space and captures the intrinsic correlations among multiple TS, which thus can improve the forecasting results, especially for multiple short time series. Experiments conducted on three public datasets and two industrial datasets verify that the proposed BHT-ARIMA effectively improves forecasting accuracy and reduces computational cost compared with the state-of-the-art methods.
Tasks	Time Series, Time Series Forecasting
Published	2020-02-25
URL	https://arxiv.org/abs/2002.12135v1
PDF	https://arxiv.org/pdf/2002.12135v1.pdf
PWC	https://paperswithcode.com/paper/block-hankel-tensor-arima-for-multiple-short
Repo	https://github.com/yokotatsuya/BHT-ARIMA
Framework	none


Title	Predictive analysis of Bitcoin price considering social sentiments
Authors	Pratikkumar Prajapati
Abstract	We report on the use of sentiment analysis on news and social media to analyze and predict the price of Bitcoin. Bitcoin is the leading cryptocurrency and has the highest market capitalization among digital currencies. Predicting Bitcoin values may help understand and predict potential market movement and future growth of the technology. Unlike (mostly) repeating phenomena like weather, cryptocurrency values do not follow a repeating pattern and mere past value of Bitcoin does not reveal any secret of future Bitcoin value. Humans follow general sentiments and technical analysis to invest in the market. Hence considering people’s sentiment can give a good degree of prediction. We focus on using social sentiment as a feature to predict future Bitcoin value, and in particular, consider Google News and Reddit posts. We find that social sentiment gives a good estimate of how future Bitcoin values may move. We achieve the lowest test RMSE of 434.87 using an LSTM that takes as inputs the historical price of various cryptocurrencies, the sentiment of news articles and the sentiment of Reddit posts.
Tasks	Sentiment Analysis
Published	2020-01-16
URL	https://arxiv.org/abs/2001.10343v1
PDF	https://arxiv.org/pdf/2001.10343v1.pdf
PWC	https://paperswithcode.com/paper/predictive-analysis-of-bitcoin-price
Repo	https://github.com/pratikpv/predicting_bitcoin_market
Framework	none

Semantically-Guided Representation Learning for Self-Supervised Monocular Depth


Title	Semantically-Guided Representation Learning for Self-Supervised Monocular Depth
Authors	Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus, Adrien Gaidon
Abstract	Self-supervised learning is showing great promise for monocular depth estimation, using geometry as the only source of supervision. Depth networks are indeed capable of learning representations that relate visual appearance to 3D properties by implicitly leveraging category-level patterns. In this work we investigate how to leverage more directly this semantic structure to guide geometric representation learning, while remaining in the self-supervised regime. Instead of using semantic labels and proxy losses in a multi-task approach, we propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning via pixel-adaptive convolutions. Furthermore, we propose a two-stage training process to overcome a common semantic bias on dynamic objects via resampling. Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.
Tasks	Depth Estimation, Monocular Depth Estimation, Representation Learning, Semantic Segmentation
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12319v1
PDF	https://arxiv.org/pdf/2002.12319v1.pdf
PWC	https://paperswithcode.com/paper/semantically-guided-representation-learning-1
Repo	https://github.com/TRI-ML/packnet-sfm
Framework	none

PM2.5-GNN: A Domain Knowledge Enhanced Graph Neural Network For PM2.5 Forecasting


Title	PM2.5-GNN: A Domain Knowledge Enhanced Graph Neural Network For PM2.5 Forecasting
Authors	Shuo Wang, Yanran Li, Jiang Zhang, Qingye Meng, Lingwei Meng, Fei Gao
Abstract	When predicting PM2.5 concentrations, it is necessary to consider complex information sources since the concentrations are influenced by various factors within a long period. In this paper, we identify a set of critical domain knowledge for PM2.5 forecasting and develop a novel graph based model, PM2.5-GNN, being capable of capturing long-term dependencies. On a real-world dataset, we validate the effectiveness of the proposed model and examine its abilities of capturing both fine-grained and long-term influences in PM2.5 process. The proposed PM2.5-GNN has also been deployed online to provide free forecasting service.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.12898v1
PDF	https://arxiv.org/pdf/2002.12898v1.pdf
PWC	https://paperswithcode.com/paper/pm25-gnn-a-domain-knowledge-enhanced-graph
Repo	https://github.com/shawnwang-tech/PM2.5-GNN
Framework	pytorch


Title	Learning 2D-3D Correspondences To Solve The Blind Perspective-n-Point Problem
Authors	Liu Liu, Dylan Campbell, Hongdong Li, Dingfu Zhou, Xibin Song, Ruigang Yang
Abstract	Conventional absolute camera pose via a Perspective-n-Point (PnP) solver often assumes that the correspondences between 2D image pixels and 3D points are given. When the correspondences between 2D and 3D points are not known a priori, the task becomes the much more challenging blind PnP problem. This paper proposes a deep CNN model which simultaneously solves for both the 6-DoF absolute camera pose and 2D–3D correspondences. Our model comprises three neural modules connected in sequence. First, a two-stream PointNet-inspired network is applied directly to both the 2D image keypoints and the 3D scene points in order to extract discriminative point-wise features harnessing both local and contextual information. Second, a global feature matching module is employed to estimate a matchability matrix among all 2D–3D pairs. Third, the obtained matchability matrix is fed into a classification module to disambiguate inlier matches. The entire network is trained end-to-end, followed by a robust model fitting (P3P-RANSAC) at test time only to recover the 6-DoF camera pose. Extensive tests on both real and simulated data have shown that our method substantially outperforms existing approaches, and is capable of processing thousands of points a second with the state-of-the-art accuracy.
Tasks
Published	2020-03-15
URL	https://arxiv.org/abs/2003.06752v1
PDF	https://arxiv.org/pdf/2003.06752v1.pdf
PWC	https://paperswithcode.com/paper/learning-2d-3d-correspondences-to-solve-the
Repo	https://github.com/Liumouliu/Deep_blind_PnP
Framework	none

Torch-Struct: Deep Structured Prediction Library


Title	Torch-Struct: Deep Structured Prediction Library
Authors	Alexander M. Rush
Abstract	The literature on structured prediction for NLP describes a rich collection of distributions and algorithms over sequences, segmentations, alignments, and trees; however, these algorithms are difficult to utilize in deep learning frameworks. We introduce Torch-Struct, a library for structured prediction designed to take advantage of and integrate with vectorized, auto-differentiation based frameworks. Torch-Struct includes a broad collection of probabilistic structures accessed through a simple and flexible distribution-based API that connects to any deep learning model. The library utilizes batched, vectorized operations and exploits auto-differentiation to produce readable, fast, and testable code. Internally, we also include a number of general-purpose optimizations to provide cross-algorithm efficiency. Experiments show significant performance gains over fast baselines and case-studies demonstrate the benefits of the library. Torch-Struct is available at https://github.com/harvardnlp/pytorch-struct.
Tasks	Structured Prediction
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00876v1
PDF	https://arxiv.org/pdf/2002.00876v1.pdf
PWC	https://paperswithcode.com/paper/torch-struct-deep-structured-prediction
Repo	https://github.com/harvardnlp/pytorch-struct
Framework	pytorch

FeatureNMS: Non-Maximum Suppression by Learning Feature Embeddings


Title	FeatureNMS: Non-Maximum Suppression by Learning Feature Embeddings
Authors	Niels Ole Salscheider
Abstract	Most state of the art object detectors output multiple detections per object. The duplicates are removed in a post-processing step called Non-Maximum Suppression. Classical Non-Maximum Suppression has shortcomings in scenes that contain objects with high overlap: The idea of this heuristic is that a high bounding box overlap corresponds to a high probability of having a duplicate. We propose FeatureNMS to solve this problem. FeatureNMS recognizes duplicates not only based on the intersection over union between bounding boxes, but also based on the difference of feature vectors. These feature vectors can encode more information like visual appearance. Our approach outperforms classical NMS and derived approaches and achieves state of the art performance.
Tasks
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07662v1
PDF	https://arxiv.org/pdf/2002.07662v1.pdf
PWC	https://paperswithcode.com/paper/featurenms-non-maximum-suppression-by
Repo	https://github.com/fzi-forschungszentrum-informatik/NNAD
Framework	none

Noisy-Input Entropy Search for Efficient Robust Bayesian Optimization


Title	Noisy-Input Entropy Search for Efficient Robust Bayesian Optimization
Authors	Lukas P. Fröhlich, Edgar D. Klenske, Julia Vinogradska, Christian Daniel, Melanie N. Zeilinger
Abstract	We consider the problem of robust optimization within the well-established Bayesian optimization (BO) framework. While BO is intrinsically robust to noisy evaluations of the objective function, standard approaches do not consider the case of uncertainty about the input parameters. In this paper, we propose Noisy-Input Entropy Search (NES), a novel information-theoretic acquisition function that is designed to find robust optima for problems with both input and measurement noise. NES is based on the key insight that the robust objective in many cases can be modeled as a Gaussian process, however, it cannot be observed directly. We evaluate NES on several benchmark problems from the optimization literature and from engineering. The results show that NES reliably finds robust optima, outperforming existing methods from the literature on all benchmarks.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02820v1
PDF	https://arxiv.org/pdf/2002.02820v1.pdf
PWC	https://paperswithcode.com/paper/noisy-input-entropy-search-for-efficient
Repo	https://github.com/boschresearch/NoisyInputEntropySearch
Framework	none

Unsupervised Question Decomposition for Question Answering


Title	Unsupervised Question Decomposition for Question Answering
Authors	Ethan Perez, Patrick Lewis, Wen-tau Yih, Kyunghyun Cho, Douwe Kiela
Abstract	We aim to improve question answering (QA) by decomposing hard questions into easier sub-questions that existing QA systems can answer. Since collecting labeled decompositions is cumbersome, we propose an unsupervised approach to produce sub-questions. Specifically, by leveraging >10M questions from Common Crawl, we learn to map from the distribution of multi-hop questions to the distribution of single-hop sub-questions. We answer sub-questions with an off-the-shelf QA model and incorporate the resulting answers in a downstream, multi-hop QA system. On a popular multi-hop QA dataset, HotpotQA, we show large improvements over a strong baseline, especially on adversarial and out-of-domain questions. Our method is generally applicable and automatically learns to decompose questions of different classes, while matching the performance of decomposition methods that rely heavily on hand-engineering and annotation.
Tasks	Question Answering
Published	2020-02-22
URL	https://arxiv.org/abs/2002.09758v2
PDF	https://arxiv.org/pdf/2002.09758v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-question-decomposition-for
Repo	https://github.com/facebookresearch/UnsupervisedDecomposition
Framework	pytorch

ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs


Title	ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs
Authors	Zuohui Fu, Yikun Xian, Shijie Geng, Yingqiang Ge, Yuting Wang, Xin Dong, Guang Wang, Gerard de Melo
Abstract	A number of cross-lingual transfer learning approaches based on neural networks have been proposed for the case when large amounts of parallel text are at our disposal. However, in many real-world settings, the size of parallel annotated training data is restricted. Additionally, prior cross-lingual mapping research has mainly focused on the word level. This raises the question of whether such techniques can also be applied to effortlessly obtain cross-lingually aligned sentence representations. To this end, we propose an Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel data.
Tasks	Cross-Lingual Transfer, Sentence Embedding, Transfer Learning
Published	2020-01-29
URL	https://arxiv.org/abs/2001.11121v1
PDF	https://arxiv.org/pdf/2001.11121v1.pdf
PWC	https://paperswithcode.com/paper/absent-cross-lingual-sentence-representation
Repo	https://github.com/zuohuif/ABSent
Framework	none

Synthetic Magnetic Resonance Images with Generative Adversarial Networks


Title	Synthetic Magnetic Resonance Images with Generative Adversarial Networks
Authors	Antoine Delplace
Abstract	Data augmentation is essential for medical research to increase the size of training datasets and achieve better results. In this work, we experiment three GAN architectures with different loss functions to generate new brain MRIs. The results show the importance of hyperparameter tuning and the use of mini-batch similarity layer in the Discriminator and gradient penalty in the loss function to achieve convergence with high quality and realism. Moreover, huge computation time is needed to generate indistinguishable images from the original dataset.
Tasks	Data Augmentation
Published	2020-01-17
URL	https://arxiv.org/abs/2002.02527v1
PDF	https://arxiv.org/pdf/2002.02527v1.pdf
PWC	https://paperswithcode.com/paper/synthetic-magnetic-resonance-images-with
Repo	https://github.com/antoinedelplace/MRI-Generation
Framework	tf