October 21, 2019

3075 words 15 mins read

Paper Group AWR 152

Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems. One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases. Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints. Deep Continuous Clustering. Semi-Generative Modelling: Covariate-Shift Adap …

Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems


Title	Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems
Authors	Filipe Rodrigues, Francisco C. Pereira
Abstract	Spatio-temporal problems are ubiquitous and of vital importance in many research fields. Despite the potential already demonstrated by deep learning methods in modeling spatio-temporal data, typical approaches tend to focus solely on conditional expectations of the output variables being modeled. In this paper, we propose a multi-output multi-quantile deep learning approach for jointly modeling several conditional quantiles together with the conditional expectation as a way to provide a more complete “picture” of the predictive density in spatio-temporal problems. Using two large-scale datasets from the transportation domain, we empirically demonstrate that, by approaching the quantile regression problem from a multi-task learning perspective, it is possible to solve the embarrassing quantile crossings problem, while simultaneously significantly outperforming state-of-the-art quantile regression methods. Moreover, we show that jointly modeling the mean and several conditional quantiles not only provides a rich description about the predictive density that can capture heteroscedastic properties at a neglectable computational overhead, but also leads to improved predictions of the conditional expectation due to the extra information and a regularization effect induced by the added quantiles.
Tasks	Multi-Task Learning
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08798v1
PDF	http://arxiv.org/pdf/1808.08798v1.pdf
PWC	https://paperswithcode.com/paper/beyond-expectation-deep-joint-mean-and
Repo	https://github.com/fmpr/DeepJMQR
Framework	tf

One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases


Title	One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases
Authors	Xingdi Yuan, Tong Wang, Rui Meng, Khushboo Thaker, Peter Brusilovsky, Daqing He, Adam Trischler
Abstract	Different texts shall by nature correspond to different number of keyphrases. This desideratum is largely missing from existing neural keyphrase generation models. In this study, we address this problem from both modeling and evaluation perspectives. We first propose a recurrent-generative model that generates multiple keyphrases as delimiter-separated sequences. Generation diversity is further enhanced with two novel techniques by manipulating decoder hidden states. In contrast to previous approaches, our model is capable of generating variable number of diverse keyphrases. We further propose two evaluation metrics tailored towards variable-number generation. We also introduce a new dataset (StackEX) that expand beyond the only existing genre (i.e., academic writing) in keyphrase generation tasks. With both previous and new evaluation metrics, our model outperforms strong baselines on all datasets.
Tasks
Published	2018-10-11
URL	https://arxiv.org/abs/1810.05241v2
PDF	https://arxiv.org/pdf/1810.05241v2.pdf
PWC	https://paperswithcode.com/paper/generating-diverse-numbers-of-diverse
Repo	https://github.com/memray/OpenNMT-kpg-release
Framework	pytorch

Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints


Title	Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
Authors	Ashutosh Baheti, Alan Ritter, Jiwei Li, Bill Dolan
Abstract	Neural conversation models tend to generate safe, generic responses for most inputs. This is due to the limitations of likelihood-based decoding objectives in generation tasks with diverse outputs, such as conversation. To address this challenge, we propose a simple yet effective approach for incorporating side information in the form of distributional constraints over the generated responses. We propose two constraints that help generate more content rich responses that are based on a model of syntax and topics (Griffiths et al., 2005) and semantic similarity (Arora et al., 2016). We evaluate our approach against a variety of competitive baselines, using both automatic metrics and human judgments, showing that our proposed approach generates responses that are much less generic without sacrificing plausibility. A working demo of our code can be found at https://github.com/abaheti95/DC-NeuralConversation.
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2018-09-04
URL	http://arxiv.org/abs/1809.01215v1
PDF	http://arxiv.org/pdf/1809.01215v1.pdf
PWC	https://paperswithcode.com/paper/generating-more-interesting-responses-in
Repo	https://github.com/felicienveldema/IR2
Framework	none

Deep Continuous Clustering


Title	Deep Continuous Clustering
Authors	Sohil Atul Shah, Vladlen Koltun
Abstract	Clustering high-dimensional datasets is hard because interpoint distances become less informative in high-dimensional spaces. We present a clustering algorithm that performs nonlinear dimensionality reduction and clustering jointly. The data is embedded into a lower-dimensional space by a deep autoencoder. The autoencoder is optimized as part of the clustering process. The resulting network produces clustered data. The presented approach does not rely on prior knowledge of the number of ground-truth clusters. Joint nonlinear dimensionality reduction and clustering are formulated as optimization of a global continuous objective. We thus avoid discrete reconfigurations of the objective that characterize prior clustering algorithms. Experiments on datasets from multiple domains demonstrate that the presented algorithm outperforms state-of-the-art clustering schemes, including recent methods that use deep networks.
Tasks	Dimensionality Reduction
Published	2018-03-05
URL	http://arxiv.org/abs/1803.01449v1
PDF	http://arxiv.org/pdf/1803.01449v1.pdf
PWC	https://paperswithcode.com/paper/deep-continuous-clustering
Repo	https://github.com/waynezhanghk/gacluster
Framework	pytorch

Semi-Generative Modelling: Covariate-Shift Adaptation with Cause and Effect Features


Title	Semi-Generative Modelling: Covariate-Shift Adaptation with Cause and Effect Features
Authors	Julius von Kügelgen, Alexander Mey, Marco Loog
Abstract	Current methods for covariate-shift adaptation use unlabelled data to compute importance weights or domain-invariant features, while the final model is trained on labelled data only. Here, we consider a particular case of covariate shift which allows us also to learn from unlabelled data, that is, combining adaptation with semi-supervised learning. Using ideas from causality, we argue that this requires learning with both causes, $X_C$, and effects, $X_E$, of a target variable, $Y$, and show how this setting leads to what we call a semi-generative model, $P(Y,X_EX_C,\theta)$. Our approach is robust to domain shifts in the distribution of causal features and leverages unlabelled data by learning a direct map from causes to effects. Experiments on synthetic data demonstrate significant improvements in classification over purely-supervised and importance-weighting baselines.
Tasks	Domain Adaptation
Published	2018-07-20
URL	http://arxiv.org/abs/1807.07879v2
PDF	http://arxiv.org/pdf/1807.07879v2.pdf
PWC	https://paperswithcode.com/paper/semi-generative-modelling-covariate-shift
Repo	https://github.com/Juliusvk/Semi-Generative-Modelling
Framework	none

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions


Title	Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
Authors	Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen
Abstract	Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc. Competing frameworks for building these networks such as TensorFlow, Chainer, CNTK, Torch/PyTorch, Caffe1/2, MXNet and Theano, explore different tradeoffs between usability and expressiveness, research or production orientation and supported hardware. They operate on a DAG of computational operators, wrapping high-performance libraries such as CUDNN (for NVIDIA GPUs) or NNPACK (for various CPUs), and automate memory allocation, synchronization, distribution. Custom operators are needed where the computation does not fit existing high-performance library calls, usually at a high engineering cost. This is frequently required when new operators are invented by researchers: such operators suffer a severe performance penalty, which limits the pace of innovation. Furthermore, even if there is an existing runtime call these frameworks can use, it often doesn’t offer optimal performance for a user’s particular network architecture and dataset, missing optimizations between operators as well as optimizations that can be done knowing the size and shape of data. Our contributions include (1) a language close to the mathematics of deep learning called Tensor Comprehensions, (2) a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, also providing optimizations such as operator fusion and specialization for specific sizes, (3) a compilation cache populated by an autotuner. [Abstract cutoff]
Tasks	Scene Understanding
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04730v3
PDF	http://arxiv.org/pdf/1802.04730v3.pdf
PWC	https://paperswithcode.com/paper/tensor-comprehensions-framework-agnostic-high
Repo	https://github.com/AIwithSwift/TFWorld2019-SwiftIn3Hours
Framework	tf

Modeling Semantic Plausibility by Injecting World Knowledge


Title	Modeling Semantic Plausibility by Injecting World Knowledge
Authors	Su Wang, Greg Durrett, Katrin Erk
Abstract	Distributional data tells us that a man can swallow candy, but not that a man can swallow a paintball, since this is never attested. However both are physically plausible events. This paper introduces the task of semantic plausibility: recognizing plausible but possibly novel events. We present a new crowdsourced dataset of semantic plausibility judgments of single events such as “man swallow paintball”. Simple models based on distributional representations perform poorly on this task, despite doing well on selection preference, but injecting manually elicited knowledge about entity properties provides a substantial performance boost. Our error analysis shows that our new dataset is a great testbed for semantic plausibility models: more sophisticated knowledge representation and propagation could address many of the remaining errors.
Tasks
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00619v3
PDF	http://arxiv.org/pdf/1804.00619v3.pdf
PWC	https://paperswithcode.com/paper/modeling-semantic-plausibility-by-injecting
Repo	https://github.com/suwangcompling/Modeling-Semantic-Plausibility-NAACL18
Framework	none

Image Segmentation using Sparse Subset Selection


Title	Image Segmentation using Sparse Subset Selection
Authors	Fariba Zohrizadeh, Mohsen Kheirandishfard, Farhad Kamangar
Abstract	In this paper, we present a new image segmentation method based on the concept of sparse subset selection. Starting with an over-segmentation, we adopt local spectral histogram features to encode the visual information of the small segments into high-dimensional vectors, called superpixel features. Then, the superpixel features are fed into a novel convex model which efficiently leverages the features to group the superpixels into a proper number of coherent regions. Our model automatically determines the optimal number of coherent regions and superpixels assignment to shape final segments. To solve our model, we propose a numerical algorithm based on the alternating direction method of multipliers (ADMM), whose iterations consist of two highly parallelizable sub-problems. We show each sub-problem enjoys closed-form solution which makes the ADMM iterations computationally very efficient. Extensive experiments on benchmark image segmentation datasets demonstrate that our proposed method in combination with an over-segmentation can provide high quality and competitive results compared to the existing state-of-the-art methods.
Tasks	Semantic Segmentation
Published	2018-04-08
URL	http://arxiv.org/abs/1804.02721v1
PDF	http://arxiv.org/pdf/1804.02721v1.pdf
PWC	https://paperswithcode.com/paper/image-segmentation-using-sparse-subset
Repo	https://github.com/mohsenkheirandishfard/IS4
Framework	none

WAIC, but Why? Generative Ensembles for Robust Anomaly Detection


Title	WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
Authors	Hyunsun Choi, Eric Jang, Alexander A. Alemi
Abstract	Machine learning models encounter Out-of-Distribution (OoD) errors when the data seen at test time are generated from a different stochastic generator than the one used to generate the training data. One proposal to scale OoD detection to high-dimensional data is to learn a tractable likelihood approximation of the training distribution, and use it to reject unlikely inputs. However, likelihood models on natural data are themselves susceptible to OoD errors, and even assign large likelihoods to samples from other datasets. To mitigate this problem, we propose Generative Ensembles, which robustify density-based OoD detection by way of estimating epistemic uncertainty of the likelihood model. We present a puzzling observation in need of an explanation – although likelihood measures cannot account for the typical set of a distribution, and therefore should not be suitable on their own for OoD detection, WAIC performs surprisingly well in practice.
Tasks	Anomaly Detection
Published	2018-10-02
URL	https://arxiv.org/abs/1810.01392v4
PDF	https://arxiv.org/pdf/1810.01392v4.pdf
PWC	https://paperswithcode.com/paper/waic-but-why-generative-ensembles-for-robust
Repo	https://github.com/ericjang/odin
Framework	pytorch

Modular Vehicle Control for Transferring Semantic Information Between Weather Conditions Using GANs


Title	Modular Vehicle Control for Transferring Semantic Information Between Weather Conditions Using GANs
Authors	Patrick Wenzel, Qadeer Khan, Daniel Cremers, Laura Leal-Taixé
Abstract	Even though end-to-end supervised learning has shown promising results for sensorimotor control of self-driving cars, its performance is greatly affected by the weather conditions under which it was trained, showing poor generalization to unseen conditions. In this paper, we show how knowledge can be transferred using semantic maps to new weather conditions without the need to obtain new ground truth data. To this end, we propose to divide the task of vehicle control into two independent modules: a control module which is only trained on one weather condition for which labeled steering data is available, and a perception module which is used as an interface between new weather conditions and the fixed control module. To generate the semantic data needed to train the perception module, we propose to use a generative adversarial network (GAN)-based model to retrieve the semantic information for the new conditions in an unsupervised manner. We introduce a master-servant architecture, where the master model (semantic labels available) trains the servant model (semantic labels not available). We show that our proposed method trained with ground truth data for a single weather condition is capable of achieving similar results on the task of steering angle prediction as an end-to-end model trained with ground truth data of 15 different weather conditions.
Tasks	Self-Driving Cars
Published	2018-07-03
URL	http://arxiv.org/abs/1807.01001v2
PDF	http://arxiv.org/pdf/1807.01001v2.pdf
PWC	https://paperswithcode.com/paper/modular-vehicle-control-for-transferring
Repo	https://github.com/pmwenzel/carla-domain-adaptation
Framework	none

A Machine Learning Approach for Virtual Flow Metering and Forecasting


Title	A Machine Learning Approach for Virtual Flow Metering and Forecasting
Authors	Nikolai Andrianov
Abstract	We are concerned with robust and accurate forecasting of multiphase flow rates in wells and pipelines during oil and gas production. In practice, the possibility to physically measure the rates is often limited; besides, it is desirable to estimate future values of multiphase rates based on the previous behavior of the system. In this work, we demonstrate that a Long Short-Term Memory (LSTM) recurrent artificial network is able not only to accurately estimate the multiphase rates at current time (i.e., act as a virtual flow meter), but also to forecast the rates for a sequence of future time instants. For a synthetic severe slugging case, LSTM forecasts compare favorably with the results of hydrodynamical modeling. LSTM results for a realistic noizy dataset of a variable rate well test show that the model can also successfully forecast multiphase rates for a system with changing flow patterns.
Tasks
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05698v1
PDF	http://arxiv.org/pdf/1802.05698v1.pdf
PWC	https://paperswithcode.com/paper/a-machine-learning-approach-for-virtual-flow
Repo	https://github.com/nikolai-andrianov/VFM
Framework	none

Brain Tumor Segmentation and Tractographic Feature Extraction from Structural MR Images for Overall Survival Prediction


Title	Brain Tumor Segmentation and Tractographic Feature Extraction from Structural MR Images for Overall Survival Prediction
Authors	Po-Yu Kao, Thuyen Ngo, Angela Zhang, Jefferson W. Chen, B. S. Manjunath
Abstract	This paper introduces a novel methodology to integrate human brain connectomics and parcellation for brain tumor segmentation and survival prediction. For segmentation, we utilize an existing brain parcellation atlas in the MNI152 1mm space and map this parcellation to each individual subject data. We use deep neural network architectures together with hard negative mining to achieve the final voxel level classification. For survival prediction, we present a new method for combining features from connectomics data, brain parcellation information, and the brain tumor mask. We leverage the average connectome information from the Human Connectome Project and map each subject brain volume onto this common connectome space. From this, we compute tractographic features that describe potential neural disruptions due to the brain tumor. These features are then used to predict the overall survival of the subjects. The main novelty in the proposed methods is the use of normalized brain parcellation data and tractography data from the human connectome project for analyzing MR images for segmentation and survival prediction. Experimental results are reported on the BraTS2018 dataset.
Tasks	Brain Tumor Segmentation
Published	2018-07-20
URL	http://arxiv.org/abs/1807.07716v3
PDF	http://arxiv.org/pdf/1807.07716v3.pdf
PWC	https://paperswithcode.com/paper/brain-tumor-segmentation-and-tractographic
Repo	https://github.com/pykao/BraTS2018-tumor-segmentation
Framework	pytorch

NE-Table: A Neural key-value table for Named Entities


Title	NE-Table: A Neural key-value table for Named Entities
Authors	Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh, Lazaros Polymenakos
Abstract	Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources. While this is easy for humans, the present neural methods that rely on learned word embeddings may not perform well for these NLP tasks, especially in the presence of Out-Of-Vocabulary (OOV) or rare NEs. In this paper, we propose a solution for this problem, and present empirical evaluations on: a) a structured Question-Answering task, b) three related Goal-Oriented dialog tasks, and c) a Reading-Comprehension task, which show that the proposed method can be effective in dealing with both in-vocabulary and OOV NEs. We create extended versions of dialog bAbI tasks 1,2 and 4 and OOV versions of the CBT test set available at - https://github.com/IBM/ne-table-datasets.
Tasks	Goal-Oriented Dialog, Question Answering, Reading Comprehension, Word Embeddings
Published	2018-04-22
URL	https://arxiv.org/abs/1804.09540v2
PDF	https://arxiv.org/pdf/1804.09540v2.pdf
PWC	https://paperswithcode.com/paper/named-entities-troubling-your-neural-methods
Repo	https://github.com/IBM/ne-table-datasets
Framework	none

A Scalable Discrete-Time Survival Model for Neural Networks


Title	A Scalable Discrete-Time Survival Model for Neural Networks
Authors	Michael F. Gensheimer, Balasubramanian Narasimhan
Abstract	There is currently great interest in applying neural networks to prediction tasks in medicine. It is important for predictive models to be able to use survival data, where each patient has a known follow-up time and event/censoring indicator. This avoids information loss when training the model and enables generation of predicted survival curves. In this paper, we describe a discrete-time survival model that is designed to be used with neural networks, which we refer to as Nnet-survival. The model is trained with the maximum likelihood method using minibatch stochastic gradient descent (SGD). The use of SGD enables rapid convergence and application to large datasets that do not fit in memory. The model is flexible, so that the baseline hazard rate and the effect of the input data on hazard probability can vary with follow-up time. It has been implemented in the Keras deep learning framework, and source code for the model and several examples is available online. We demonstrate the performance of the model on both simulated and real data and compare it to existing models Cox-nnet and Deepsurv.
Tasks	Survival Analysis
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00917v3
PDF	http://arxiv.org/pdf/1805.00917v3.pdf
PWC	https://paperswithcode.com/paper/a-scalable-discrete-time-survival-model-for
Repo	https://github.com/MGensheimer/nnet-survival
Framework	tf

LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling


Title	LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling
Authors	Zhi-Lin Ke, Hsiang-Yun Cheng, Chia-Lin Yang
Abstract	Machine learning algorithms, such as Support Vector Machine (SVM) and Deep Neural Network (DNN), have gained a lot of interests recently. When training a machine learning algorithm, randomly shuffle all the training data can improve the testing accuracy and boost the convergence rate. Nevertheless, realizing training data random shuffling in a real system is not a straightforward process due to the slow random accesses in hard disk drive (HDD). To avoid frequent random disk access, the effect of random shuffling is often limited in existing approaches. With the emerging non-volatile memory-based storage device, such as Intel Optane SSD, which provides fast random accesses, we propose a lightweight implementation of random shuffling (LIRS) to randomly shuffle the indexes of the entire training dataset, and the selected training instances are directly accessed from the storage and packed into batches. Experimental results show that LIRS can reduce the total training time of SVM and DNN by 49.9% and 43.5% on average, and improve the final testing accuracy on DNN by 1.01%.
Tasks
Published	2018-10-10
URL	http://arxiv.org/abs/1810.04509v1
PDF	http://arxiv.org/pdf/1810.04509v1.pdf
PWC	https://paperswithcode.com/paper/lirs-enabling-efficient-machine-learning-on
Repo	https://github.com/winiel559/ZhiLin-LIRS
Framework	none