May 7, 2019

3248 words 16 mins read

Paper Group ANR 131

Hierarchical Memory Networks. Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data. Ask the GRU: Multi-Task Learning for Deep Text Recommendations. A Machine Learning Nowcasting Method based on Real-time Reanalysis Data. Environmental Noise Embeddings for Robust Speech Recognition. DeepSoft: A vision for a deep model of softwa …

Hierarchical Memory Networks


Title	Hierarchical Memory Networks
Authors	Sarath Chandar, Sungjin Ahn, Hugo Larochelle, Pascal Vincent, Gerald Tesauro, Yoshua Bengio
Abstract	Memory networks are neural networks with an explicit memory component that can be both read and written to by the network. The memory is often addressed in a soft way using a softmax function, making end-to-end training with backpropagation possible. However, this is not computationally scalable for applications which require the network to read from extremely large memories. On the other hand, it is well known that hard attention mechanisms based on reinforcement learning are challenging to train successfully. In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks. The memory is organized in a hierarchical structure such that reading from it is done with less computation than soft attention over a flat memory, while also being easier to train than hard attention over a flat memory. Specifically, we propose to incorporate Maximum Inner Product Search (MIPS) in the training and inference procedures for our hierarchical memory network. We explore the use of various state-of-the art approximate MIPS techniques and report results on SimpleQuestions, a challenging large scale factoid question answering task.
Tasks	Question Answering
Published	2016-05-24
URL	http://arxiv.org/abs/1605.07427v1
PDF	http://arxiv.org/pdf/1605.07427v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-memory-networks
Repo
Framework

Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data


Title	Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data
Authors	Makoto Yamada, Jiliang Tang, Jose Lugo-Martinez, Ermin Hodzic, Raunak Shrestha, Avishek Saha, Hua Ouyang, Dawei Yin, Hiroshi Mamitsuka, Cenk Sahinalp, Predrag Radivojac, Filippo Menczer, Yi Chang
Abstract	Machine learning methods are used to discover complex nonlinear relationships in biological and medical data. However, sophisticated learning models are computationally unfeasible for data with millions of features. Here we introduce the first feature selection method for nonlinear learning problems that can scale up to large, ultra-high dimensional biological data. More specifically, we scale up the novel Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso) to handle millions of features with tens of thousand samples. The proposed method is guaranteed to find an optimal subset of maximally predictive features with minimal redundancy, yielding higher predictive power and improved interpretability. Its effectiveness is demonstrated through applications to classify phenotypes based on module expression in human prostate cancer patients and to detect enzymes among protein structures. We achieve high accuracy with as few as 20 out of one million features — a dimensionality reduction of 99.998%. Our algorithm can be implemented on commodity cloud computing platforms. The dramatic reduction of features may lead to the ubiquitous deployment of sophisticated prediction models in mobile health care applications.
Tasks	Dimensionality Reduction, Feature Selection
Published	2016-08-14
URL	http://arxiv.org/abs/1608.04048v1
PDF	http://arxiv.org/pdf/1608.04048v1.pdf
PWC	https://paperswithcode.com/paper/ultra-high-dimensional-nonlinear-feature
Repo
Framework

Ask the GRU: Multi-Task Learning for Deep Text Recommendations


Title	Ask the GRU: Multi-Task Learning for Deep Text Recommendations
Authors	Trapit Bansal, David Belanger, Andrew McCallum
Abstract	In a variety of application domains the content to be recommended to users is associated with text. This includes research papers, movies with associated plot summaries, news articles, blog posts, etc. Recommendation approaches based on latent factor models can be extended naturally to leverage text by employing an explicit mapping from text to factors. This enables recommendations for new, unseen content, and may generalize better, since the factors for all items are produced by a compactly-parametrized model. Previous work has used topic models or averages of word embeddings for this mapping. In this paper we present a method leveraging deep recurrent neural networks to encode the text sequence into a latent vector, specifically gated recurrent units (GRUs) trained end-to-end on the collaborative filtering task. For the task of scientific paper recommendation, this yields models with significantly higher accuracy. In cold-start scenarios, we beat the previous state-of-the-art, all of which ignore word order. Performance is further improved by multi-task learning, where the text encoder network is trained for a combination of content recommendation and item metadata prediction. This regularizes the collaborative filtering model, ameliorating the problem of sparsity of the observed rating matrix.
Tasks	Multi-Task Learning, Topic Models, Word Embeddings
Published	2016-09-07
URL	http://arxiv.org/abs/1609.02116v2
PDF	http://arxiv.org/pdf/1609.02116v2.pdf
PWC	https://paperswithcode.com/paper/ask-the-gru-multi-task-learning-for-deep-text
Repo
Framework

A Machine Learning Nowcasting Method based on Real-time Reanalysis Data


Title	A Machine Learning Nowcasting Method based on Real-time Reanalysis Data
Authors	Lei Han, Juanzhen Sun, Wei Zhang, Yuanyuan Xiu, Hailei Feng, Yinjing Lin
Abstract	Despite marked progress over the past several decades, convective storm nowcasting remains a challenge because most nowcasting systems are based on linear extrapolation of radar reflectivity without much consideration for other meteorological fields. The variational Doppler radar analysis system (VDRAS) is an advanced convective-scale analysis system capable of providing analysis of 3-D wind, temperature, and humidity by assimilating Doppler radar observations. Although potentially useful, it is still an open question as to how to use these fields to improve nowcasting. In this study, we present results from our first attempt at developing a Support Vector Machine (SVM) Box-based nOWcasting (SBOW) method under the machine learning framework using VDRAS analysis data. The key design points of SBOW are as follows: 1) The study domain is divided into many position-fixed small boxes and the nowcasting problem is transformed into one question, i.e., will a radar echo > 35 dBZ appear in a box in 30 minutes? 2) Box-based temporal and spatial features, which include time trends and surrounding environmental information, are elaborately constructed, and 3) The box-based constructed features are used to first train the SVM classifier, and then the trained classifier is used to make predictions. Compared with complicated and expensive expert systems, the above design of SBOW allows the system to be small, compact, straightforward, and easy to maintain and expand at low cost. The experimental results show that, although no complicated tracking algorithm is used, SBOW can predict the storm movement trend and storm growth with reasonable skill.
Tasks
Published	2016-09-14
URL	https://arxiv.org/abs/1609.04103v2
PDF	https://arxiv.org/pdf/1609.04103v2.pdf
PWC	https://paperswithcode.com/paper/a-machine-learning-nowcasting-method-based-on
Repo
Framework

Environmental Noise Embeddings for Robust Speech Recognition


Title	Environmental Noise Embeddings for Robust Speech Recognition
Authors	Suyoun Kim, Bhiksha Raj, Ian Lane
Abstract	We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model. A deep neural network is used to predict the acoustic environment in which the system in being used. The discriminative embedding generated at the bottleneck layer of this network is then concatenated with traditional acoustic features as input to a deep neural network acoustic model. Through a series of experiments on Resource Management, CHiME-3 task, and Aurora4, we show that the proposed approach significantly improves speech recognition accuracy in noisy and highly reverberant environments, outperforming multi-condition training, noise-aware training, i-vector framework, and multi-task learning on both in-domain noise and unseen noise.
Tasks	Multi-Task Learning, Robust Speech Recognition, Speech Recognition
Published	2016-01-11
URL	http://arxiv.org/abs/1601.02553v2
PDF	http://arxiv.org/pdf/1601.02553v2.pdf
PWC	https://paperswithcode.com/paper/environmental-noise-embeddings-for-robust
Repo
Framework

DeepSoft: A vision for a deep model of software


Title	DeepSoft: A vision for a deep model of software
Authors	Hoa Khanh Dam, Truyen Tran, John Grundy, Aditya Ghose
Abstract	Although software analytics has experienced rapid growth as a research area, it has not yet reached its full potential for wide industrial adoption. Most of the existing work in software analytics still relies heavily on costly manual feature engineering processes, and they mainly address the traditional classification problems, as opposed to predicting future events. We present a vision for \emph{DeepSoft}, an \emph{end-to-end} generic framework for modeling software and its development process to predict future risks and recommend interventions. DeepSoft, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term temporal dependencies that occur in software evolution. Such deep learned patterns of software can be used to address a range of challenging problems such as code and task recommendation and prediction. DeepSoft provides a new approach for research into modeling of source code, risk prediction and mitigation, developer modeling, and automatically generating code patches from bug reports.
Tasks	Feature Engineering
Published	2016-07-30
URL	http://arxiv.org/abs/1608.00092v1
PDF	http://arxiv.org/pdf/1608.00092v1.pdf
PWC	https://paperswithcode.com/paper/deepsoft-a-vision-for-a-deep-model-of
Repo
Framework

FVQA: Fact-based Visual Question Answering


Title	FVQA: Fact-based Visual Question Answering
Authors	Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel, Anthony Dick
Abstract	Visual Question Answering (VQA) has attracted a lot of attention in both Computer Vision and Natural Language Processing communities, not least because it offers insight into the relationships between two important sources of information. Current datasets, and the models built upon them, have focused on questions which are answerable by direct analysis of the question and image alone. The set of such questions that require no external information to answer is interesting, but very limited. It excludes questions which require common sense, or basic factual knowledge to answer, for example. Here we introduce FVQA, a VQA dataset which requires, and supports, much deeper reasoning. FVQA only contains questions which require external information to answer. We thus extend a conventional visual question answering dataset, which contains image-question-answerg triplets, through additional image-question-answer-supporting fact tuples. The supporting fact is represented as a structural triplet, such as <Cat,CapableOf,ClimbingTrees>. We evaluate several baseline models on the FVQA dataset, and describe a novel model which is capable of reasoning about an image on the basis of supporting facts.
Tasks	Common Sense Reasoning, Question Answering, Visual Question Answering
Published	2016-06-17
URL	http://arxiv.org/abs/1606.05433v4
PDF	http://arxiv.org/pdf/1606.05433v4.pdf
PWC	https://paperswithcode.com/paper/fvqa-fact-based-visual-question-answering
Repo
Framework

DizzyRNN: Reparameterizing Recurrent Neural Networks for Norm-Preserving Backpropagation


Title	DizzyRNN: Reparameterizing Recurrent Neural Networks for Norm-Preserving Backpropagation
Authors	Victor Dorobantu, Per Andre Stromhaug, Jess Renteria
Abstract	The vanishing and exploding gradient problems are well-studied obstacles that make it difficult for recurrent neural networks to learn long-term time dependencies. We propose a reparameterization of standard recurrent neural networks to update linear transformations in a provably norm-preserving way through Givens rotations. Additionally, we use the absolute value function as an element-wise non-linearity to preserve the norm of backpropagated signals over the entire network. We show that this reparameterization reduces the number of parameters and maintains the same algorithmic complexity as a standard recurrent neural network, while outperforming standard recurrent neural networks with orthogonal initializations and Long Short-Term Memory networks on the copy problem.
Tasks
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04035v1
PDF	http://arxiv.org/pdf/1612.04035v1.pdf
PWC	https://paperswithcode.com/paper/dizzyrnn-reparameterizing-recurrent-neural
Repo
Framework

A Physician Advisory System for Chronic Heart Failure Management Based on Knowledge Patterns


Title	A Physician Advisory System for Chronic Heart Failure Management Based on Knowledge Patterns
Authors	Zhuo Chen, Kyle Marple, Elmer Salazar, Gopal Gupta, Lakshman Tamil
Abstract	Management of chronic diseases such as heart failure, diabetes, and chronic obstructive pulmonary disease (COPD) is a major problem in health care. A standard approach that the medical community has devised to manage widely prevalent chronic diseases such as chronic heart failure (CHF) is to have a committee of experts develop guidelines that all physicians should follow. These guidelines typically consist of a series of complex rules that make recommendations based on a patient’s information. Due to their complexity, often the guidelines are either ignored or not complied with at all, which can result in poor medical practices. It is not even clear whether it is humanly possible to follow these guidelines due to their length and complexity. In the case of CHF management, the guidelines run nearly 80 pages. In this paper we describe a physician-advisory system for CHF management that codes the entire set of clinical practice guidelines for CHF using answer set programming. Our approach is based on developing reasoning templates (that we call knowledge patterns) and using these patterns to systemically code the clinical guidelines for CHF as ASP rules. Use of the knowledge patterns greatly facilitates the development of our system. Given a patient’s medical information, our system generates a recommendation for treatment just as a human physician would, using the guidelines. Our system will work even in the presence of incomplete information. Our work makes two contributions: (i) it shows that highly complex guidelines can be successfully coded as ASP rules, and (ii) it develops a series of knowledge patterns that facilitate the coding of knowledge expressed in a natural language and that can be used for other application domains. This paper is under consideration for acceptance in TPLP.
Tasks
Published	2016-10-25
URL	http://arxiv.org/abs/1610.08115v1
PDF	http://arxiv.org/pdf/1610.08115v1.pdf
PWC	https://paperswithcode.com/paper/a-physician-advisory-system-for-chronic-heart
Repo
Framework

Font Identification in Historical Documents Using Active Learning


Title	Font Identification in Historical Documents Using Active Learning
Authors	Anshul Gupta, Ricardo Gutierrez-Osuna, Matthew Christy, Richard Furuta, Laura Mandell
Abstract	Identifying the type of font (e.g., Roman, Blackletter) used in historical documents can help optical character recognition (OCR) systems produce more accurate text transcriptions. Towards this end, we present an active-learning strategy that can significantly reduce the number of labeled samples needed to train a font classifier. Our approach extracts image-based features that exploit geometric differences between fonts at the word level, and combines them into a bag-of-word representation for each page in a document. We evaluate six sampling strategies based on uncertainty, dissimilarity and diversity criteria, and test them on a database containing over 3,000 historical documents with Blackletter, Roman and Mixed fonts. Our results show that a combination of uncertainty and diversity achieves the highest predictive accuracy (89% of test cases correctly classified) while requiring only a small fraction of the data (17%) to be labeled. We discuss the implications of this result for mass digitization projects of historical documents.
Tasks	Active Learning, Optical Character Recognition
Published	2016-01-27
URL	http://arxiv.org/abs/1601.07252v1
PDF	http://arxiv.org/pdf/1601.07252v1.pdf
PWC	https://paperswithcode.com/paper/font-identification-in-historical-documents
Repo
Framework

Representation learning for very short texts using weighted word embedding aggregation


Title	Representation learning for very short texts using weighted word embedding aggregation
Authors	Cedric De Boom, Steven Van Canneyt, Thomas Demeester, Bart Dhoedt
Abstract	Short text messages such as tweets are very noisy and sparse in their use of vocabulary. Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications such as event detection, opinion mining, news recommendation, etc. We constructed a method based on semantic word embeddings and frequency information to arrive at low-dimensional representations for short texts designed to capture semantic similarity. For this purpose we designed a weight-based model and a learning procedure based on a novel median-based loss function. This paper discusses the details of our model and the optimization methods, together with the experimental results on both Wikipedia and Twitter data. We find that our method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining. Our method is therefore capable of retaining most of the semantic information in the text, and is applicable out-of-the-box.
Tasks	Opinion Mining, Representation Learning, Semantic Similarity, Semantic Textual Similarity, Word Embeddings
Published	2016-07-02
URL	http://arxiv.org/abs/1607.00570v1
PDF	http://arxiv.org/pdf/1607.00570v1.pdf
PWC	https://paperswithcode.com/paper/representation-learning-for-very-short-texts
Repo
Framework

Towards Abstraction from Extraction: Multiple Timescale Gated Recurrent Unit for Summarization


Title	Towards Abstraction from Extraction: Multiple Timescale Gated Recurrent Unit for Summarization
Authors	Minsoo Kim, Moirangthem Dennis Singh, Minho Lee
Abstract	In this work, we introduce temporal hierarchies to the sequence to sequence (seq2seq) model to tackle the problem of abstractive summarization of scientific articles. The proposed Multiple Timescale model of the Gated Recurrent Unit (MTGRU) is implemented in the encoder-decoder setting to better deal with the presence of multiple compositionalities in larger texts. The proposed model is compared to the conventional RNN encoder-decoder, and the results demonstrate that our model trains faster and shows significant performance gains. The results also show that the temporal hierarchies help improve the ability of seq2seq models to capture compositionalities better without the presence of highly complex architectural hierarchies.
Tasks	Abstractive Text Summarization
Published	2016-07-04
URL	http://arxiv.org/abs/1607.00718v1
PDF	http://arxiv.org/pdf/1607.00718v1.pdf
PWC	https://paperswithcode.com/paper/towards-abstraction-from-extraction-multiple
Repo
Framework

Resource Constrained Structured Prediction


Title	Resource Constrained Structured Prediction
Authors	Tolga Bolukbasi, Kai-Wei Chang, Joseph Wang, Venkatesh Saligrama
Abstract	We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy.
Tasks	Dependency Parsing, Optical Character Recognition, Structured Prediction
Published	2016-02-28
URL	http://arxiv.org/abs/1602.08761v2
PDF	http://arxiv.org/pdf/1602.08761v2.pdf
PWC	https://paperswithcode.com/paper/resource-constrained-structured-prediction
Repo
Framework

Supervised multiview learning based on simultaneous learning of multiview intact and single view classifier


Title	Supervised multiview learning based on simultaneous learning of multiview intact and single view classifier
Authors	Qingjun Wang, Haiyan Lv, Jun Yue, Eugene Mitchell
Abstract	Multiview learning problem refers to the problem of learning a classifier from multiple view data. In this data set, each data points is presented by multiple different views. In this paper, we propose a novel method for this problem. This method is based on two assumptions. The first assumption is that each data point has an intact feature vector, and each view is obtained by a linear transformation from the intact vector. The second assumption is that the intact vectors are discriminative, and in the intact space, we have a linear classifier to separate the positive class from the negative class. We define an intact vector for each data point, and a view-conditional transformation matrix for each view, and propose to reconstruct the multiple view feature vectors by the product of the corresponding intact vectors and transformation matrices. Moreover, we also propose a linear classifier in the intact space, and learn it jointly with the intact vectors. The learning problem is modeled by a minimization problem, and the objective function is composed of a Cauchy error estimator-based view-conditional reconstruction term over all data points and views, and a classification error term measured by hinge loss over all the intact vectors of all the data points. Some regularization terms are also imposed to different variables in the objective function. The minimization problem is solve by an iterative algorithm using alternate optimization strategy and gradient descent algorithm. The proposed algorithm shows it advantage in the compression to other multiview learning algorithms on benchmark data sets.
Tasks	Multiview Learning
Published	2016-01-09
URL	http://arxiv.org/abs/1601.02098v1
PDF	http://arxiv.org/pdf/1601.02098v1.pdf
PWC	https://paperswithcode.com/paper/supervised-multiview-learning-based-on
Repo
Framework

Inducing Interpretable Representations with Variational Autoencoders


Title	Inducing Interpretable Representations with Variational Autoencoders
Authors	N. Siddharth, Brooks Paige, Alban Desmaison, Jan-Willem Van de Meent, Frank Wood, Noah D. Goodman, Pushmeet Kohli, Philip H. S. Torr
Abstract	We develop a framework for incorporating structured graphical models in the \emph{encoders} of variational autoencoders (VAEs) that allows us to induce interpretable representations through approximate variational inference. This allows us to both perform reasoning (e.g. classification) under the structural constraints of a given graphical model, and use deep generative models to deal with messy, high-dimensional domains where it is often difficult to model all the variation. Learning in this framework is carried out end-to-end with a variational objective, applying to both unsupervised and semi-supervised schemes.
Tasks
Published	2016-11-22
URL	http://arxiv.org/abs/1611.07492v1
PDF	http://arxiv.org/pdf/1611.07492v1.pdf
PWC	https://paperswithcode.com/paper/inducing-interpretable-representations-with
Repo
Framework