Paper Group AWR 152
Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems. One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases. Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints. Deep Continuous Clustering. Semi-Generative Modelling: Covariate-Shift Adap …
Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems
Title | Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems |
Authors | Filipe Rodrigues, Francisco C. Pereira |
Abstract | Spatio-temporal problems are ubiquitous and of vital importance in many research fields. Despite the potential already demonstrated by deep learning methods in modeling spatio-temporal data, typical approaches tend to focus solely on conditional expectations of the output variables being modeled. In this paper, we propose a multi-output multi-quantile deep learning approach for jointly modeling several conditional quantiles together with the conditional expectation as a way to provide a more complete “picture” of the predictive density in spatio-temporal problems. Using two large-scale datasets from the transportation domain, we empirically demonstrate that, by approaching the quantile regression problem from a multi-task learning perspective, it is possible to solve the embarrassing quantile crossings problem, while simultaneously significantly outperforming state-of-the-art quantile regression methods. Moreover, we show that jointly modeling the mean and several conditional quantiles not only provides a rich description about the predictive density that can capture heteroscedastic properties at a neglectable computational overhead, but also leads to improved predictions of the conditional expectation due to the extra information and a regularization effect induced by the added quantiles. |
Tasks | Multi-Task Learning |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08798v1 |
http://arxiv.org/pdf/1808.08798v1.pdf | |
PWC | https://paperswithcode.com/paper/beyond-expectation-deep-joint-mean-and |
Repo | https://github.com/fmpr/DeepJMQR |
Framework | tf |
One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases
Title | One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases |
Authors | Xingdi Yuan, Tong Wang, Rui Meng, Khushboo Thaker, Peter Brusilovsky, Daqing He, Adam Trischler |
Abstract | Different texts shall by nature correspond to different number of keyphrases. This desideratum is largely missing from existing neural keyphrase generation models. In this study, we address this problem from both modeling and evaluation perspectives. We first propose a recurrent-generative model that generates multiple keyphrases as delimiter-separated sequences. Generation diversity is further enhanced with two novel techniques by manipulating decoder hidden states. In contrast to previous approaches, our model is capable of generating variable number of diverse keyphrases. We further propose two evaluation metrics tailored towards variable-number generation. We also introduce a new dataset (StackEX) that expand beyond the only existing genre (i.e., academic writing) in keyphrase generation tasks. With both previous and new evaluation metrics, our model outperforms strong baselines on all datasets. |
Tasks | |
Published | 2018-10-11 |
URL | https://arxiv.org/abs/1810.05241v2 |
https://arxiv.org/pdf/1810.05241v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-diverse-numbers-of-diverse |
Repo | https://github.com/memray/OpenNMT-kpg-release |
Framework | pytorch |
Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
Title | Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints |
Authors | Ashutosh Baheti, Alan Ritter, Jiwei Li, Bill Dolan |
Abstract | Neural conversation models tend to generate safe, generic responses for most inputs. This is due to the limitations of likelihood-based decoding objectives in generation tasks with diverse outputs, such as conversation. To address this challenge, we propose a simple yet effective approach for incorporating side information in the form of distributional constraints over the generated responses. We propose two constraints that help generate more content rich responses that are based on a model of syntax and topics (Griffiths et al., 2005) and semantic similarity (Arora et al., 2016). We evaluate our approach against a variety of competitive baselines, using both automatic metrics and human judgments, showing that our proposed approach generates responses that are much less generic without sacrificing plausibility. A working demo of our code can be found at https://github.com/abaheti95/DC-NeuralConversation. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.01215v1 |
http://arxiv.org/pdf/1809.01215v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-more-interesting-responses-in |
Repo | https://github.com/felicienveldema/IR2 |
Framework | none |
Deep Continuous Clustering
Title | Deep Continuous Clustering |
Authors | Sohil Atul Shah, Vladlen Koltun |
Abstract | Clustering high-dimensional datasets is hard because interpoint distances become less informative in high-dimensional spaces. We present a clustering algorithm that performs nonlinear dimensionality reduction and clustering jointly. The data is embedded into a lower-dimensional space by a deep autoencoder. The autoencoder is optimized as part of the clustering process. The resulting network produces clustered data. The presented approach does not rely on prior knowledge of the number of ground-truth clusters. Joint nonlinear dimensionality reduction and clustering are formulated as optimization of a global continuous objective. We thus avoid discrete reconfigurations of the objective that characterize prior clustering algorithms. Experiments on datasets from multiple domains demonstrate that the presented algorithm outperforms state-of-the-art clustering schemes, including recent methods that use deep networks. |
Tasks | Dimensionality Reduction |
Published | 2018-03-05 |
URL | http://arxiv.org/abs/1803.01449v1 |
http://arxiv.org/pdf/1803.01449v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-continuous-clustering |
Repo | https://github.com/waynezhanghk/gacluster |
Framework | pytorch |
Semi-Generative Modelling: Covariate-Shift Adaptation with Cause and Effect Features
Title | Semi-Generative Modelling: Covariate-Shift Adaptation with Cause and Effect Features |
Authors | Julius von Kügelgen, Alexander Mey, Marco Loog |
Abstract | Current methods for covariate-shift adaptation use unlabelled data to compute importance weights or domain-invariant features, while the final model is trained on labelled data only. Here, we consider a particular case of covariate shift which allows us also to learn from unlabelled data, that is, combining adaptation with semi-supervised learning. Using ideas from causality, we argue that this requires learning with both causes, $X_C$, and effects, $X_E$, of a target variable, $Y$, and show how this setting leads to what we call a semi-generative model, $P(Y,X_EX_C,\theta)$. Our approach is robust to domain shifts in the distribution of causal features and leverages unlabelled data by learning a direct map from causes to effects. Experiments on synthetic data demonstrate significant improvements in classification over purely-supervised and importance-weighting baselines. |
Tasks | Domain Adaptation |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.07879v2 |
http://arxiv.org/pdf/1807.07879v2.pdf | |
PWC | https://paperswithcode.com/paper/semi-generative-modelling-covariate-shift |
Repo | https://github.com/Juliusvk/Semi-Generative-Modelling |
Framework | none |
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
Title | Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions |
Authors | Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen |
Abstract | Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc. Competing frameworks for building these networks such as TensorFlow, Chainer, CNTK, Torch/PyTorch, Caffe1/2, MXNet and Theano, explore different tradeoffs between usability and expressiveness, research or production orientation and supported hardware. They operate on a DAG of computational operators, wrapping high-performance libraries such as CUDNN (for NVIDIA GPUs) or NNPACK (for various CPUs), and automate memory allocation, synchronization, distribution. Custom operators are needed where the computation does not fit existing high-performance library calls, usually at a high engineering cost. This is frequently required when new operators are invented by researchers: such operators suffer a severe performance penalty, which limits the pace of innovation. Furthermore, even if there is an existing runtime call these frameworks can use, it often doesn’t offer optimal performance for a user’s particular network architecture and dataset, missing optimizations between operators as well as optimizations that can be done knowing the size and shape of data. Our contributions include (1) a language close to the mathematics of deep learning called Tensor Comprehensions, (2) a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, also providing optimizations such as operator fusion and specialization for specific sizes, (3) a compilation cache populated by an autotuner. [Abstract cutoff] |
Tasks | Scene Understanding |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04730v3 |
http://arxiv.org/pdf/1802.04730v3.pdf | |
PWC | https://paperswithcode.com/paper/tensor-comprehensions-framework-agnostic-high |
Repo | https://github.com/AIwithSwift/TFWorld2019-SwiftIn3Hours |
Framework | tf |
Modeling Semantic Plausibility by Injecting World Knowledge
Title | Modeling Semantic Plausibility by Injecting World Knowledge |
Authors | Su Wang, Greg Durrett, Katrin Erk |
Abstract | Distributional data tells us that a man can swallow candy, but not that a man can swallow a paintball, since this is never attested. However both are physically plausible events. This paper introduces the task of semantic plausibility: recognizing plausible but possibly novel events. We present a new crowdsourced dataset of semantic plausibility judgments of single events such as “man swallow paintball”. Simple models based on distributional representations perform poorly on this task, despite doing well on selection preference, but injecting manually elicited knowledge about entity properties provides a substantial performance boost. Our error analysis shows that our new dataset is a great testbed for semantic plausibility models: more sophisticated knowledge representation and propagation could address many of the remaining errors. |
Tasks | |
Published | 2018-04-02 |
URL | http://arxiv.org/abs/1804.00619v3 |
http://arxiv.org/pdf/1804.00619v3.pdf | |
PWC | https://paperswithcode.com/paper/modeling-semantic-plausibility-by-injecting |
Repo | https://github.com/suwangcompling/Modeling-Semantic-Plausibility-NAACL18 |
Framework | none |
Image Segmentation using Sparse Subset Selection
Title | Image Segmentation using Sparse Subset Selection |
Authors | Fariba Zohrizadeh, Mohsen Kheirandishfard, Farhad Kamangar |
Abstract | In this paper, we present a new image segmentation method based on the concept of sparse subset selection. Starting with an over-segmentation, we adopt local spectral histogram features to encode the visual information of the small segments into high-dimensional vectors, called superpixel features. Then, the superpixel features are fed into a novel convex model which efficiently leverages the features to group the superpixels into a proper number of coherent regions. Our model automatically determines the optimal number of coherent regions and superpixels assignment to shape final segments. To solve our model, we propose a numerical algorithm based on the alternating direction method of multipliers (ADMM), whose iterations consist of two highly parallelizable sub-problems. We show each sub-problem enjoys closed-form solution which makes the ADMM iterations computationally very efficient. Extensive experiments on benchmark image segmentation datasets demonstrate that our proposed method in combination with an over-segmentation can provide high quality and competitive results compared to the existing state-of-the-art methods. |
Tasks | Semantic Segmentation |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02721v1 |
http://arxiv.org/pdf/1804.02721v1.pdf | |
PWC | https://paperswithcode.com/paper/image-segmentation-using-sparse-subset |
Repo | https://github.com/mohsenkheirandishfard/IS4 |
Framework | none |
WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
Title | WAIC, but Why? Generative Ensembles for Robust Anomaly Detection |
Authors | Hyunsun Choi, Eric Jang, Alexander A. Alemi |
Abstract | Machine learning models encounter Out-of-Distribution (OoD) errors when the data seen at test time are generated from a different stochastic generator than the one used to generate the training data. One proposal to scale OoD detection to high-dimensional data is to learn a tractable likelihood approximation of the training distribution, and use it to reject unlikely inputs. However, likelihood models on natural data are themselves susceptible to OoD errors, and even assign large likelihoods to samples from other datasets. To mitigate this problem, we propose Generative Ensembles, which robustify density-based OoD detection by way of estimating epistemic uncertainty of the likelihood model. We present a puzzling observation in need of an explanation – although likelihood measures cannot account for the typical set of a distribution, and therefore should not be suitable on their own for OoD detection, WAIC performs surprisingly well in practice. |
Tasks | Anomaly Detection |
Published | 2018-10-02 |
URL | https://arxiv.org/abs/1810.01392v4 |
https://arxiv.org/pdf/1810.01392v4.pdf | |
PWC | https://paperswithcode.com/paper/waic-but-why-generative-ensembles-for-robust |
Repo | https://github.com/ericjang/odin |
Framework | pytorch |
Modular Vehicle Control for Transferring Semantic Information Between Weather Conditions Using GANs
Title | Modular Vehicle Control for Transferring Semantic Information Between Weather Conditions Using GANs |
Authors | Patrick Wenzel, Qadeer Khan, Daniel Cremers, Laura Leal-Taixé |
Abstract | Even though end-to-end supervised learning has shown promising results for sensorimotor control of self-driving cars, its performance is greatly affected by the weather conditions under which it was trained, showing poor generalization to unseen conditions. In this paper, we show how knowledge can be transferred using semantic maps to new weather conditions without the need to obtain new ground truth data. To this end, we propose to divide the task of vehicle control into two independent modules: a control module which is only trained on one weather condition for which labeled steering data is available, and a perception module which is used as an interface between new weather conditions and the fixed control module. To generate the semantic data needed to train the perception module, we propose to use a generative adversarial network (GAN)-based model to retrieve the semantic information for the new conditions in an unsupervised manner. We introduce a master-servant architecture, where the master model (semantic labels available) trains the servant model (semantic labels not available). We show that our proposed method trained with ground truth data for a single weather condition is capable of achieving similar results on the task of steering angle prediction as an end-to-end model trained with ground truth data of 15 different weather conditions. |
Tasks | Self-Driving Cars |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.01001v2 |
http://arxiv.org/pdf/1807.01001v2.pdf | |
PWC | https://paperswithcode.com/paper/modular-vehicle-control-for-transferring |
Repo | https://github.com/pmwenzel/carla-domain-adaptation |
Framework | none |
A Machine Learning Approach for Virtual Flow Metering and Forecasting
Title | A Machine Learning Approach for Virtual Flow Metering and Forecasting |
Authors | Nikolai Andrianov |
Abstract | We are concerned with robust and accurate forecasting of multiphase flow rates in wells and pipelines during oil and gas production. In practice, the possibility to physically measure the rates is often limited; besides, it is desirable to estimate future values of multiphase rates based on the previous behavior of the system. In this work, we demonstrate that a Long Short-Term Memory (LSTM) recurrent artificial network is able not only to accurately estimate the multiphase rates at current time (i.e., act as a virtual flow meter), but also to forecast the rates for a sequence of future time instants. For a synthetic severe slugging case, LSTM forecasts compare favorably with the results of hydrodynamical modeling. LSTM results for a realistic noizy dataset of a variable rate well test show that the model can also successfully forecast multiphase rates for a system with changing flow patterns. |
Tasks | |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05698v1 |
http://arxiv.org/pdf/1802.05698v1.pdf | |
PWC | https://paperswithcode.com/paper/a-machine-learning-approach-for-virtual-flow |
Repo | https://github.com/nikolai-andrianov/VFM |
Framework | none |
Brain Tumor Segmentation and Tractographic Feature Extraction from Structural MR Images for Overall Survival Prediction
Title | Brain Tumor Segmentation and Tractographic Feature Extraction from Structural MR Images for Overall Survival Prediction |
Authors | Po-Yu Kao, Thuyen Ngo, Angela Zhang, Jefferson W. Chen, B. S. Manjunath |
Abstract | This paper introduces a novel methodology to integrate human brain connectomics and parcellation for brain tumor segmentation and survival prediction. For segmentation, we utilize an existing brain parcellation atlas in the MNI152 1mm space and map this parcellation to each individual subject data. We use deep neural network architectures together with hard negative mining to achieve the final voxel level classification. For survival prediction, we present a new method for combining features from connectomics data, brain parcellation information, and the brain tumor mask. We leverage the average connectome information from the Human Connectome Project and map each subject brain volume onto this common connectome space. From this, we compute tractographic features that describe potential neural disruptions due to the brain tumor. These features are then used to predict the overall survival of the subjects. The main novelty in the proposed methods is the use of normalized brain parcellation data and tractography data from the human connectome project for analyzing MR images for segmentation and survival prediction. Experimental results are reported on the BraTS2018 dataset. |
Tasks | Brain Tumor Segmentation |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.07716v3 |
http://arxiv.org/pdf/1807.07716v3.pdf | |
PWC | https://paperswithcode.com/paper/brain-tumor-segmentation-and-tractographic |
Repo | https://github.com/pykao/BraTS2018-tumor-segmentation |
Framework | pytorch |
NE-Table: A Neural key-value table for Named Entities
Title | NE-Table: A Neural key-value table for Named Entities |
Authors | Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh, Lazaros Polymenakos |
Abstract | Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources. While this is easy for humans, the present neural methods that rely on learned word embeddings may not perform well for these NLP tasks, especially in the presence of Out-Of-Vocabulary (OOV) or rare NEs. In this paper, we propose a solution for this problem, and present empirical evaluations on: a) a structured Question-Answering task, b) three related Goal-Oriented dialog tasks, and c) a Reading-Comprehension task, which show that the proposed method can be effective in dealing with both in-vocabulary and OOV NEs. We create extended versions of dialog bAbI tasks 1,2 and 4 and OOV versions of the CBT test set available at - https://github.com/IBM/ne-table-datasets. |
Tasks | Goal-Oriented Dialog, Question Answering, Reading Comprehension, Word Embeddings |
Published | 2018-04-22 |
URL | https://arxiv.org/abs/1804.09540v2 |
https://arxiv.org/pdf/1804.09540v2.pdf | |
PWC | https://paperswithcode.com/paper/named-entities-troubling-your-neural-methods |
Repo | https://github.com/IBM/ne-table-datasets |
Framework | none |
A Scalable Discrete-Time Survival Model for Neural Networks
Title | A Scalable Discrete-Time Survival Model for Neural Networks |
Authors | Michael F. Gensheimer, Balasubramanian Narasimhan |
Abstract | There is currently great interest in applying neural networks to prediction tasks in medicine. It is important for predictive models to be able to use survival data, where each patient has a known follow-up time and event/censoring indicator. This avoids information loss when training the model and enables generation of predicted survival curves. In this paper, we describe a discrete-time survival model that is designed to be used with neural networks, which we refer to as Nnet-survival. The model is trained with the maximum likelihood method using minibatch stochastic gradient descent (SGD). The use of SGD enables rapid convergence and application to large datasets that do not fit in memory. The model is flexible, so that the baseline hazard rate and the effect of the input data on hazard probability can vary with follow-up time. It has been implemented in the Keras deep learning framework, and source code for the model and several examples is available online. We demonstrate the performance of the model on both simulated and real data and compare it to existing models Cox-nnet and Deepsurv. |
Tasks | Survival Analysis |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00917v3 |
http://arxiv.org/pdf/1805.00917v3.pdf | |
PWC | https://paperswithcode.com/paper/a-scalable-discrete-time-survival-model-for |
Repo | https://github.com/MGensheimer/nnet-survival |
Framework | tf |
LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling
Title | LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling |
Authors | Zhi-Lin Ke, Hsiang-Yun Cheng, Chia-Lin Yang |
Abstract | Machine learning algorithms, such as Support Vector Machine (SVM) and Deep Neural Network (DNN), have gained a lot of interests recently. When training a machine learning algorithm, randomly shuffle all the training data can improve the testing accuracy and boost the convergence rate. Nevertheless, realizing training data random shuffling in a real system is not a straightforward process due to the slow random accesses in hard disk drive (HDD). To avoid frequent random disk access, the effect of random shuffling is often limited in existing approaches. With the emerging non-volatile memory-based storage device, such as Intel Optane SSD, which provides fast random accesses, we propose a lightweight implementation of random shuffling (LIRS) to randomly shuffle the indexes of the entire training dataset, and the selected training instances are directly accessed from the storage and packed into batches. Experimental results show that LIRS can reduce the total training time of SVM and DNN by 49.9% and 43.5% on average, and improve the final testing accuracy on DNN by 1.01%. |
Tasks | |
Published | 2018-10-10 |
URL | http://arxiv.org/abs/1810.04509v1 |
http://arxiv.org/pdf/1810.04509v1.pdf | |
PWC | https://paperswithcode.com/paper/lirs-enabling-efficient-machine-learning-on |
Repo | https://github.com/winiel559/ZhiLin-LIRS |
Framework | none |