October 21, 2019

3075 words 15 mins read

Paper Group AWR 152

Paper Group AWR 152

Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems. One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases. Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints. Deep Continuous Clustering. Semi-Generative Modelling: Covariate-Shift Adap …

Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems

Title Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems
Authors Filipe Rodrigues, Francisco C. Pereira
Abstract Spatio-temporal problems are ubiquitous and of vital importance in many research fields. Despite the potential already demonstrated by deep learning methods in modeling spatio-temporal data, typical approaches tend to focus solely on conditional expectations of the output variables being modeled. In this paper, we propose a multi-output multi-quantile deep learning approach for jointly modeling several conditional quantiles together with the conditional expectation as a way to provide a more complete “picture” of the predictive density in spatio-temporal problems. Using two large-scale datasets from the transportation domain, we empirically demonstrate that, by approaching the quantile regression problem from a multi-task learning perspective, it is possible to solve the embarrassing quantile crossings problem, while simultaneously significantly outperforming state-of-the-art quantile regression methods. Moreover, we show that jointly modeling the mean and several conditional quantiles not only provides a rich description about the predictive density that can capture heteroscedastic properties at a neglectable computational overhead, but also leads to improved predictions of the conditional expectation due to the extra information and a regularization effect induced by the added quantiles.
Tasks Multi-Task Learning
Published 2018-08-27
URL http://arxiv.org/abs/1808.08798v1
PDF http://arxiv.org/pdf/1808.08798v1.pdf
PWC https://paperswithcode.com/paper/beyond-expectation-deep-joint-mean-and
Repo https://github.com/fmpr/DeepJMQR
Framework tf

One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases

Title One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases
Authors Xingdi Yuan, Tong Wang, Rui Meng, Khushboo Thaker, Peter Brusilovsky, Daqing He, Adam Trischler
Abstract Different texts shall by nature correspond to different number of keyphrases. This desideratum is largely missing from existing neural keyphrase generation models. In this study, we address this problem from both modeling and evaluation perspectives. We first propose a recurrent-generative model that generates multiple keyphrases as delimiter-separated sequences. Generation diversity is further enhanced with two novel techniques by manipulating decoder hidden states. In contrast to previous approaches, our model is capable of generating variable number of diverse keyphrases. We further propose two evaluation metrics tailored towards variable-number generation. We also introduce a new dataset (StackEX) that expand beyond the only existing genre (i.e., academic writing) in keyphrase generation tasks. With both previous and new evaluation metrics, our model outperforms strong baselines on all datasets.
Tasks
Published 2018-10-11
URL https://arxiv.org/abs/1810.05241v2
PDF https://arxiv.org/pdf/1810.05241v2.pdf
PWC https://paperswithcode.com/paper/generating-diverse-numbers-of-diverse
Repo https://github.com/memray/OpenNMT-kpg-release
Framework pytorch

Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints

Title Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
Authors Ashutosh Baheti, Alan Ritter, Jiwei Li, Bill Dolan
Abstract Neural conversation models tend to generate safe, generic responses for most inputs. This is due to the limitations of likelihood-based decoding objectives in generation tasks with diverse outputs, such as conversation. To address this challenge, we propose a simple yet effective approach for incorporating side information in the form of distributional constraints over the generated responses. We propose two constraints that help generate more content rich responses that are based on a model of syntax and topics (Griffiths et al., 2005) and semantic similarity (Arora et al., 2016). We evaluate our approach against a variety of competitive baselines, using both automatic metrics and human judgments, showing that our proposed approach generates responses that are much less generic without sacrificing plausibility. A working demo of our code can be found at https://github.com/abaheti95/DC-NeuralConversation.
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2018-09-04
URL http://arxiv.org/abs/1809.01215v1
PDF http://arxiv.org/pdf/1809.01215v1.pdf
PWC https://paperswithcode.com/paper/generating-more-interesting-responses-in
Repo https://github.com/felicienveldema/IR2
Framework none

Deep Continuous Clustering

Title Deep Continuous Clustering
Authors Sohil Atul Shah, Vladlen Koltun
Abstract Clustering high-dimensional datasets is hard because interpoint distances become less informative in high-dimensional spaces. We present a clustering algorithm that performs nonlinear dimensionality reduction and clustering jointly. The data is embedded into a lower-dimensional space by a deep autoencoder. The autoencoder is optimized as part of the clustering process. The resulting network produces clustered data. The presented approach does not rely on prior knowledge of the number of ground-truth clusters. Joint nonlinear dimensionality reduction and clustering are formulated as optimization of a global continuous objective. We thus avoid discrete reconfigurations of the objective that characterize prior clustering algorithms. Experiments on datasets from multiple domains demonstrate that the presented algorithm outperforms state-of-the-art clustering schemes, including recent methods that use deep networks.
Tasks Dimensionality Reduction
Published 2018-03-05
URL http://arxiv.org/abs/1803.01449v1
PDF http://arxiv.org/pdf/1803.01449v1.pdf
PWC https://paperswithcode.com/paper/deep-continuous-clustering
Repo https://github.com/waynezhanghk/gacluster
Framework pytorch

Semi-Generative Modelling: Covariate-Shift Adaptation with Cause and Effect Features

Title Semi-Generative Modelling: Covariate-Shift Adaptation with Cause and Effect Features
Authors Julius von Kügelgen, Alexander Mey, Marco Loog
Abstract Current methods for covariate-shift adaptation use unlabelled data to compute importance weights or domain-invariant features, while the final model is trained on labelled data only. Here, we consider a particular case of covariate shift which allows us also to learn from unlabelled data, that is, combining adaptation with semi-supervised learning. Using ideas from causality, we argue that this requires learning with both causes, $X_C$, and effects, $X_E$, of a target variable, $Y$, and show how this setting leads to what we call a semi-generative model, $P(Y,X_EX_C,\theta)$. Our approach is robust to domain shifts in the distribution of causal features and leverages unlabelled data by learning a direct map from causes to effects. Experiments on synthetic data demonstrate significant improvements in classification over purely-supervised and importance-weighting baselines.
Tasks Domain Adaptation
Published 2018-07-20
URL http://arxiv.org/abs/1807.07879v2
PDF http://arxiv.org/pdf/1807.07879v2.pdf
PWC https://paperswithcode.com/paper/semi-generative-modelling-covariate-shift
Repo https://github.com/Juliusvk/Semi-Generative-Modelling
Framework none

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

Title Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
Authors Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen
Abstract Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc. Competing frameworks for building these networks such as TensorFlow, Chainer, CNTK, Torch/PyTorch, Caffe1/2, MXNet and Theano, explore different tradeoffs between usability and expressiveness, research or production orientation and supported hardware. They operate on a DAG of computational operators, wrapping high-performance libraries such as CUDNN (for NVIDIA GPUs) or NNPACK (for various CPUs), and automate memory allocation, synchronization, distribution. Custom operators are needed where the computation does not fit existing high-performance library calls, usually at a high engineering cost. This is frequently required when new operators are invented by researchers: such operators suffer a severe performance penalty, which limits the pace of innovation. Furthermore, even if there is an existing runtime call these frameworks can use, it often doesn’t offer optimal performance for a user’s particular network architecture and dataset, missing optimizations between operators as well as optimizations that can be done knowing the size and shape of data. Our contributions include (1) a language close to the mathematics of deep learning called Tensor Comprehensions, (2) a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, also providing optimizations such as operator fusion and specialization for specific sizes, (3) a compilation cache populated by an autotuner. [Abstract cutoff]
Tasks Scene Understanding
Published 2018-02-13
URL http://arxiv.org/abs/1802.04730v3
PDF http://arxiv.org/pdf/1802.04730v3.pdf
PWC https://paperswithcode.com/paper/tensor-comprehensions-framework-agnostic-high
Repo https://github.com/AIwithSwift/TFWorld2019-SwiftIn3Hours
Framework tf

Modeling Semantic Plausibility by Injecting World Knowledge

Title Modeling Semantic Plausibility by Injecting World Knowledge
Authors Su Wang, Greg Durrett, Katrin Erk
Abstract Distributional data tells us that a man can swallow candy, but not that a man can swallow a paintball, since this is never attested. However both are physically plausible events. This paper introduces the task of semantic plausibility: recognizing plausible but possibly novel events. We present a new crowdsourced dataset of semantic plausibility judgments of single events such as “man swallow paintball”. Simple models based on distributional representations perform poorly on this task, despite doing well on selection preference, but injecting manually elicited knowledge about entity properties provides a substantial performance boost. Our error analysis shows that our new dataset is a great testbed for semantic plausibility models: more sophisticated knowledge representation and propagation could address many of the remaining errors.
Tasks
Published 2018-04-02
URL http://arxiv.org/abs/1804.00619v3
PDF http://arxiv.org/pdf/1804.00619v3.pdf
PWC https://paperswithcode.com/paper/modeling-semantic-plausibility-by-injecting
Repo https://github.com/suwangcompling/Modeling-Semantic-Plausibility-NAACL18
Framework none

Image Segmentation using Sparse Subset Selection

Title Image Segmentation using Sparse Subset Selection
Authors Fariba Zohrizadeh, Mohsen Kheirandishfard, Farhad Kamangar
Abstract In this paper, we present a new image segmentation method based on the concept of sparse subset selection. Starting with an over-segmentation, we adopt local spectral histogram features to encode the visual information of the small segments into high-dimensional vectors, called superpixel features. Then, the superpixel features are fed into a novel convex model which efficiently leverages the features to group the superpixels into a proper number of coherent regions. Our model automatically determines the optimal number of coherent regions and superpixels assignment to shape final segments. To solve our model, we propose a numerical algorithm based on the alternating direction method of multipliers (ADMM), whose iterations consist of two highly parallelizable sub-problems. We show each sub-problem enjoys closed-form solution which makes the ADMM iterations computationally very efficient. Extensive experiments on benchmark image segmentation datasets demonstrate that our proposed method in combination with an over-segmentation can provide high quality and competitive results compared to the existing state-of-the-art methods.
Tasks Semantic Segmentation
Published 2018-04-08
URL http://arxiv.org/abs/1804.02721v1
PDF http://arxiv.org/pdf/1804.02721v1.pdf
PWC https://paperswithcode.com/paper/image-segmentation-using-sparse-subset
Repo https://github.com/mohsenkheirandishfard/IS4
Framework none

WAIC, but Why? Generative Ensembles for Robust Anomaly Detection

Title WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
Authors Hyunsun Choi, Eric Jang, Alexander A. Alemi
Abstract Machine learning models encounter Out-of-Distribution (OoD) errors when the data seen at test time are generated from a different stochastic generator than the one used to generate the training data. One proposal to scale OoD detection to high-dimensional data is to learn a tractable likelihood approximation of the training distribution, and use it to reject unlikely inputs. However, likelihood models on natural data are themselves susceptible to OoD errors, and even assign large likelihoods to samples from other datasets. To mitigate this problem, we propose Generative Ensembles, which robustify density-based OoD detection by way of estimating epistemic uncertainty of the likelihood model. We present a puzzling observation in need of an explanation – although likelihood measures cannot account for the typical set of a distribution, and therefore should not be suitable on their own for OoD detection, WAIC performs surprisingly well in practice.
Tasks Anomaly Detection
Published 2018-10-02
URL https://arxiv.org/abs/1810.01392v4
PDF https://arxiv.org/pdf/1810.01392v4.pdf
PWC https://paperswithcode.com/paper/waic-but-why-generative-ensembles-for-robust
Repo https://github.com/ericjang/odin
Framework pytorch

Modular Vehicle Control for Transferring Semantic Information Between Weather Conditions Using GANs

Title Modular Vehicle Control for Transferring Semantic Information Between Weather Conditions Using GANs
Authors Patrick Wenzel, Qadeer Khan, Daniel Cremers, Laura Leal-Taixé
Abstract Even though end-to-end supervised learning has shown promising results for sensorimotor control of self-driving cars, its performance is greatly affected by the weather conditions under which it was trained, showing poor generalization to unseen conditions. In this paper, we show how knowledge can be transferred using semantic maps to new weather conditions without the need to obtain new ground truth data. To this end, we propose to divide the task of vehicle control into two independent modules: a control module which is only trained on one weather condition for which labeled steering data is available, and a perception module which is used as an interface between new weather conditions and the fixed control module. To generate the semantic data needed to train the perception module, we propose to use a generative adversarial network (GAN)-based model to retrieve the semantic information for the new conditions in an unsupervised manner. We introduce a master-servant architecture, where the master model (semantic labels available) trains the servant model (semantic labels not available). We show that our proposed method trained with ground truth data for a single weather condition is capable of achieving similar results on the task of steering angle prediction as an end-to-end model trained with ground truth data of 15 different weather conditions.
Tasks Self-Driving Cars
Published 2018-07-03
URL http://arxiv.org/abs/1807.01001v2
PDF http://arxiv.org/pdf/1807.01001v2.pdf
PWC https://paperswithcode.com/paper/modular-vehicle-control-for-transferring
Repo https://github.com/pmwenzel/carla-domain-adaptation
Framework none

A Machine Learning Approach for Virtual Flow Metering and Forecasting

Title A Machine Learning Approach for Virtual Flow Metering and Forecasting
Authors Nikolai Andrianov
Abstract We are concerned with robust and accurate forecasting of multiphase flow rates in wells and pipelines during oil and gas production. In practice, the possibility to physically measure the rates is often limited; besides, it is desirable to estimate future values of multiphase rates based on the previous behavior of the system. In this work, we demonstrate that a Long Short-Term Memory (LSTM) recurrent artificial network is able not only to accurately estimate the multiphase rates at current time (i.e., act as a virtual flow meter), but also to forecast the rates for a sequence of future time instants. For a synthetic severe slugging case, LSTM forecasts compare favorably with the results of hydrodynamical modeling. LSTM results for a realistic noizy dataset of a variable rate well test show that the model can also successfully forecast multiphase rates for a system with changing flow patterns.
Tasks
Published 2018-02-15
URL http://arxiv.org/abs/1802.05698v1
PDF http://arxiv.org/pdf/1802.05698v1.pdf
PWC https://paperswithcode.com/paper/a-machine-learning-approach-for-virtual-flow
Repo https://github.com/nikolai-andrianov/VFM
Framework none

Brain Tumor Segmentation and Tractographic Feature Extraction from Structural MR Images for Overall Survival Prediction

Title Brain Tumor Segmentation and Tractographic Feature Extraction from Structural MR Images for Overall Survival Prediction
Authors Po-Yu Kao, Thuyen Ngo, Angela Zhang, Jefferson W. Chen, B. S. Manjunath
Abstract This paper introduces a novel methodology to integrate human brain connectomics and parcellation for brain tumor segmentation and survival prediction. For segmentation, we utilize an existing brain parcellation atlas in the MNI152 1mm space and map this parcellation to each individual subject data. We use deep neural network architectures together with hard negative mining to achieve the final voxel level classification. For survival prediction, we present a new method for combining features from connectomics data, brain parcellation information, and the brain tumor mask. We leverage the average connectome information from the Human Connectome Project and map each subject brain volume onto this common connectome space. From this, we compute tractographic features that describe potential neural disruptions due to the brain tumor. These features are then used to predict the overall survival of the subjects. The main novelty in the proposed methods is the use of normalized brain parcellation data and tractography data from the human connectome project for analyzing MR images for segmentation and survival prediction. Experimental results are reported on the BraTS2018 dataset.
Tasks Brain Tumor Segmentation
Published 2018-07-20
URL http://arxiv.org/abs/1807.07716v3
PDF http://arxiv.org/pdf/1807.07716v3.pdf
PWC https://paperswithcode.com/paper/brain-tumor-segmentation-and-tractographic
Repo https://github.com/pykao/BraTS2018-tumor-segmentation
Framework pytorch

NE-Table: A Neural key-value table for Named Entities

Title NE-Table: A Neural key-value table for Named Entities
Authors Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh, Lazaros Polymenakos
Abstract Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources. While this is easy for humans, the present neural methods that rely on learned word embeddings may not perform well for these NLP tasks, especially in the presence of Out-Of-Vocabulary (OOV) or rare NEs. In this paper, we propose a solution for this problem, and present empirical evaluations on: a) a structured Question-Answering task, b) three related Goal-Oriented dialog tasks, and c) a Reading-Comprehension task, which show that the proposed method can be effective in dealing with both in-vocabulary and OOV NEs. We create extended versions of dialog bAbI tasks 1,2 and 4 and OOV versions of the CBT test set available at - https://github.com/IBM/ne-table-datasets.
Tasks Goal-Oriented Dialog, Question Answering, Reading Comprehension, Word Embeddings
Published 2018-04-22
URL https://arxiv.org/abs/1804.09540v2
PDF https://arxiv.org/pdf/1804.09540v2.pdf
PWC https://paperswithcode.com/paper/named-entities-troubling-your-neural-methods
Repo https://github.com/IBM/ne-table-datasets
Framework none

A Scalable Discrete-Time Survival Model for Neural Networks

Title A Scalable Discrete-Time Survival Model for Neural Networks
Authors Michael F. Gensheimer, Balasubramanian Narasimhan
Abstract There is currently great interest in applying neural networks to prediction tasks in medicine. It is important for predictive models to be able to use survival data, where each patient has a known follow-up time and event/censoring indicator. This avoids information loss when training the model and enables generation of predicted survival curves. In this paper, we describe a discrete-time survival model that is designed to be used with neural networks, which we refer to as Nnet-survival. The model is trained with the maximum likelihood method using minibatch stochastic gradient descent (SGD). The use of SGD enables rapid convergence and application to large datasets that do not fit in memory. The model is flexible, so that the baseline hazard rate and the effect of the input data on hazard probability can vary with follow-up time. It has been implemented in the Keras deep learning framework, and source code for the model and several examples is available online. We demonstrate the performance of the model on both simulated and real data and compare it to existing models Cox-nnet and Deepsurv.
Tasks Survival Analysis
Published 2018-05-02
URL http://arxiv.org/abs/1805.00917v3
PDF http://arxiv.org/pdf/1805.00917v3.pdf
PWC https://paperswithcode.com/paper/a-scalable-discrete-time-survival-model-for
Repo https://github.com/MGensheimer/nnet-survival
Framework tf

LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling

Title LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling
Authors Zhi-Lin Ke, Hsiang-Yun Cheng, Chia-Lin Yang
Abstract Machine learning algorithms, such as Support Vector Machine (SVM) and Deep Neural Network (DNN), have gained a lot of interests recently. When training a machine learning algorithm, randomly shuffle all the training data can improve the testing accuracy and boost the convergence rate. Nevertheless, realizing training data random shuffling in a real system is not a straightforward process due to the slow random accesses in hard disk drive (HDD). To avoid frequent random disk access, the effect of random shuffling is often limited in existing approaches. With the emerging non-volatile memory-based storage device, such as Intel Optane SSD, which provides fast random accesses, we propose a lightweight implementation of random shuffling (LIRS) to randomly shuffle the indexes of the entire training dataset, and the selected training instances are directly accessed from the storage and packed into batches. Experimental results show that LIRS can reduce the total training time of SVM and DNN by 49.9% and 43.5% on average, and improve the final testing accuracy on DNN by 1.01%.
Tasks
Published 2018-10-10
URL http://arxiv.org/abs/1810.04509v1
PDF http://arxiv.org/pdf/1810.04509v1.pdf
PWC https://paperswithcode.com/paper/lirs-enabling-efficient-machine-learning-on
Repo https://github.com/winiel559/ZhiLin-LIRS
Framework none
comments powered by Disqus