Paper Group ANR 131
Hierarchical Memory Networks. Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data. Ask the GRU: Multi-Task Learning for Deep Text Recommendations. A Machine Learning Nowcasting Method based on Real-time Reanalysis Data. Environmental Noise Embeddings for Robust Speech Recognition. DeepSoft: A vision for a deep model of softwa …
Hierarchical Memory Networks
Title | Hierarchical Memory Networks |
Authors | Sarath Chandar, Sungjin Ahn, Hugo Larochelle, Pascal Vincent, Gerald Tesauro, Yoshua Bengio |
Abstract | Memory networks are neural networks with an explicit memory component that can be both read and written to by the network. The memory is often addressed in a soft way using a softmax function, making end-to-end training with backpropagation possible. However, this is not computationally scalable for applications which require the network to read from extremely large memories. On the other hand, it is well known that hard attention mechanisms based on reinforcement learning are challenging to train successfully. In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks. The memory is organized in a hierarchical structure such that reading from it is done with less computation than soft attention over a flat memory, while also being easier to train than hard attention over a flat memory. Specifically, we propose to incorporate Maximum Inner Product Search (MIPS) in the training and inference procedures for our hierarchical memory network. We explore the use of various state-of-the art approximate MIPS techniques and report results on SimpleQuestions, a challenging large scale factoid question answering task. |
Tasks | Question Answering |
Published | 2016-05-24 |
URL | http://arxiv.org/abs/1605.07427v1 |
http://arxiv.org/pdf/1605.07427v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-memory-networks |
Repo | |
Framework | |
Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data
Title | Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data |
Authors | Makoto Yamada, Jiliang Tang, Jose Lugo-Martinez, Ermin Hodzic, Raunak Shrestha, Avishek Saha, Hua Ouyang, Dawei Yin, Hiroshi Mamitsuka, Cenk Sahinalp, Predrag Radivojac, Filippo Menczer, Yi Chang |
Abstract | Machine learning methods are used to discover complex nonlinear relationships in biological and medical data. However, sophisticated learning models are computationally unfeasible for data with millions of features. Here we introduce the first feature selection method for nonlinear learning problems that can scale up to large, ultra-high dimensional biological data. More specifically, we scale up the novel Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso) to handle millions of features with tens of thousand samples. The proposed method is guaranteed to find an optimal subset of maximally predictive features with minimal redundancy, yielding higher predictive power and improved interpretability. Its effectiveness is demonstrated through applications to classify phenotypes based on module expression in human prostate cancer patients and to detect enzymes among protein structures. We achieve high accuracy with as few as 20 out of one million features — a dimensionality reduction of 99.998%. Our algorithm can be implemented on commodity cloud computing platforms. The dramatic reduction of features may lead to the ubiquitous deployment of sophisticated prediction models in mobile health care applications. |
Tasks | Dimensionality Reduction, Feature Selection |
Published | 2016-08-14 |
URL | http://arxiv.org/abs/1608.04048v1 |
http://arxiv.org/pdf/1608.04048v1.pdf | |
PWC | https://paperswithcode.com/paper/ultra-high-dimensional-nonlinear-feature |
Repo | |
Framework | |
Ask the GRU: Multi-Task Learning for Deep Text Recommendations
Title | Ask the GRU: Multi-Task Learning for Deep Text Recommendations |
Authors | Trapit Bansal, David Belanger, Andrew McCallum |
Abstract | In a variety of application domains the content to be recommended to users is associated with text. This includes research papers, movies with associated plot summaries, news articles, blog posts, etc. Recommendation approaches based on latent factor models can be extended naturally to leverage text by employing an explicit mapping from text to factors. This enables recommendations for new, unseen content, and may generalize better, since the factors for all items are produced by a compactly-parametrized model. Previous work has used topic models or averages of word embeddings for this mapping. In this paper we present a method leveraging deep recurrent neural networks to encode the text sequence into a latent vector, specifically gated recurrent units (GRUs) trained end-to-end on the collaborative filtering task. For the task of scientific paper recommendation, this yields models with significantly higher accuracy. In cold-start scenarios, we beat the previous state-of-the-art, all of which ignore word order. Performance is further improved by multi-task learning, where the text encoder network is trained for a combination of content recommendation and item metadata prediction. This regularizes the collaborative filtering model, ameliorating the problem of sparsity of the observed rating matrix. |
Tasks | Multi-Task Learning, Topic Models, Word Embeddings |
Published | 2016-09-07 |
URL | http://arxiv.org/abs/1609.02116v2 |
http://arxiv.org/pdf/1609.02116v2.pdf | |
PWC | https://paperswithcode.com/paper/ask-the-gru-multi-task-learning-for-deep-text |
Repo | |
Framework | |
A Machine Learning Nowcasting Method based on Real-time Reanalysis Data
Title | A Machine Learning Nowcasting Method based on Real-time Reanalysis Data |
Authors | Lei Han, Juanzhen Sun, Wei Zhang, Yuanyuan Xiu, Hailei Feng, Yinjing Lin |
Abstract | Despite marked progress over the past several decades, convective storm nowcasting remains a challenge because most nowcasting systems are based on linear extrapolation of radar reflectivity without much consideration for other meteorological fields. The variational Doppler radar analysis system (VDRAS) is an advanced convective-scale analysis system capable of providing analysis of 3-D wind, temperature, and humidity by assimilating Doppler radar observations. Although potentially useful, it is still an open question as to how to use these fields to improve nowcasting. In this study, we present results from our first attempt at developing a Support Vector Machine (SVM) Box-based nOWcasting (SBOW) method under the machine learning framework using VDRAS analysis data. The key design points of SBOW are as follows: 1) The study domain is divided into many position-fixed small boxes and the nowcasting problem is transformed into one question, i.e., will a radar echo > 35 dBZ appear in a box in 30 minutes? 2) Box-based temporal and spatial features, which include time trends and surrounding environmental information, are elaborately constructed, and 3) The box-based constructed features are used to first train the SVM classifier, and then the trained classifier is used to make predictions. Compared with complicated and expensive expert systems, the above design of SBOW allows the system to be small, compact, straightforward, and easy to maintain and expand at low cost. The experimental results show that, although no complicated tracking algorithm is used, SBOW can predict the storm movement trend and storm growth with reasonable skill. |
Tasks | |
Published | 2016-09-14 |
URL | https://arxiv.org/abs/1609.04103v2 |
https://arxiv.org/pdf/1609.04103v2.pdf | |
PWC | https://paperswithcode.com/paper/a-machine-learning-nowcasting-method-based-on |
Repo | |
Framework | |
Environmental Noise Embeddings for Robust Speech Recognition
Title | Environmental Noise Embeddings for Robust Speech Recognition |
Authors | Suyoun Kim, Bhiksha Raj, Ian Lane |
Abstract | We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model. A deep neural network is used to predict the acoustic environment in which the system in being used. The discriminative embedding generated at the bottleneck layer of this network is then concatenated with traditional acoustic features as input to a deep neural network acoustic model. Through a series of experiments on Resource Management, CHiME-3 task, and Aurora4, we show that the proposed approach significantly improves speech recognition accuracy in noisy and highly reverberant environments, outperforming multi-condition training, noise-aware training, i-vector framework, and multi-task learning on both in-domain noise and unseen noise. |
Tasks | Multi-Task Learning, Robust Speech Recognition, Speech Recognition |
Published | 2016-01-11 |
URL | http://arxiv.org/abs/1601.02553v2 |
http://arxiv.org/pdf/1601.02553v2.pdf | |
PWC | https://paperswithcode.com/paper/environmental-noise-embeddings-for-robust |
Repo | |
Framework | |
DeepSoft: A vision for a deep model of software
Title | DeepSoft: A vision for a deep model of software |
Authors | Hoa Khanh Dam, Truyen Tran, John Grundy, Aditya Ghose |
Abstract | Although software analytics has experienced rapid growth as a research area, it has not yet reached its full potential for wide industrial adoption. Most of the existing work in software analytics still relies heavily on costly manual feature engineering processes, and they mainly address the traditional classification problems, as opposed to predicting future events. We present a vision for \emph{DeepSoft}, an \emph{end-to-end} generic framework for modeling software and its development process to predict future risks and recommend interventions. DeepSoft, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term temporal dependencies that occur in software evolution. Such deep learned patterns of software can be used to address a range of challenging problems such as code and task recommendation and prediction. DeepSoft provides a new approach for research into modeling of source code, risk prediction and mitigation, developer modeling, and automatically generating code patches from bug reports. |
Tasks | Feature Engineering |
Published | 2016-07-30 |
URL | http://arxiv.org/abs/1608.00092v1 |
http://arxiv.org/pdf/1608.00092v1.pdf | |
PWC | https://paperswithcode.com/paper/deepsoft-a-vision-for-a-deep-model-of |
Repo | |
Framework | |
FVQA: Fact-based Visual Question Answering
Title | FVQA: Fact-based Visual Question Answering |
Authors | Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel, Anthony Dick |
Abstract | Visual Question Answering (VQA) has attracted a lot of attention in both Computer Vision and Natural Language Processing communities, not least because it offers insight into the relationships between two important sources of information. Current datasets, and the models built upon them, have focused on questions which are answerable by direct analysis of the question and image alone. The set of such questions that require no external information to answer is interesting, but very limited. It excludes questions which require common sense, or basic factual knowledge to answer, for example. Here we introduce FVQA, a VQA dataset which requires, and supports, much deeper reasoning. FVQA only contains questions which require external information to answer. We thus extend a conventional visual question answering dataset, which contains image-question-answerg triplets, through additional image-question-answer-supporting fact tuples. The supporting fact is represented as a structural triplet, such as <Cat,CapableOf,ClimbingTrees>. We evaluate several baseline models on the FVQA dataset, and describe a novel model which is capable of reasoning about an image on the basis of supporting facts. |
Tasks | Common Sense Reasoning, Question Answering, Visual Question Answering |
Published | 2016-06-17 |
URL | http://arxiv.org/abs/1606.05433v4 |
http://arxiv.org/pdf/1606.05433v4.pdf | |
PWC | https://paperswithcode.com/paper/fvqa-fact-based-visual-question-answering |
Repo | |
Framework | |
DizzyRNN: Reparameterizing Recurrent Neural Networks for Norm-Preserving Backpropagation
Title | DizzyRNN: Reparameterizing Recurrent Neural Networks for Norm-Preserving Backpropagation |
Authors | Victor Dorobantu, Per Andre Stromhaug, Jess Renteria |
Abstract | The vanishing and exploding gradient problems are well-studied obstacles that make it difficult for recurrent neural networks to learn long-term time dependencies. We propose a reparameterization of standard recurrent neural networks to update linear transformations in a provably norm-preserving way through Givens rotations. Additionally, we use the absolute value function as an element-wise non-linearity to preserve the norm of backpropagated signals over the entire network. We show that this reparameterization reduces the number of parameters and maintains the same algorithmic complexity as a standard recurrent neural network, while outperforming standard recurrent neural networks with orthogonal initializations and Long Short-Term Memory networks on the copy problem. |
Tasks | |
Published | 2016-12-13 |
URL | http://arxiv.org/abs/1612.04035v1 |
http://arxiv.org/pdf/1612.04035v1.pdf | |
PWC | https://paperswithcode.com/paper/dizzyrnn-reparameterizing-recurrent-neural |
Repo | |
Framework | |
A Physician Advisory System for Chronic Heart Failure Management Based on Knowledge Patterns
Title | A Physician Advisory System for Chronic Heart Failure Management Based on Knowledge Patterns |
Authors | Zhuo Chen, Kyle Marple, Elmer Salazar, Gopal Gupta, Lakshman Tamil |
Abstract | Management of chronic diseases such as heart failure, diabetes, and chronic obstructive pulmonary disease (COPD) is a major problem in health care. A standard approach that the medical community has devised to manage widely prevalent chronic diseases such as chronic heart failure (CHF) is to have a committee of experts develop guidelines that all physicians should follow. These guidelines typically consist of a series of complex rules that make recommendations based on a patient’s information. Due to their complexity, often the guidelines are either ignored or not complied with at all, which can result in poor medical practices. It is not even clear whether it is humanly possible to follow these guidelines due to their length and complexity. In the case of CHF management, the guidelines run nearly 80 pages. In this paper we describe a physician-advisory system for CHF management that codes the entire set of clinical practice guidelines for CHF using answer set programming. Our approach is based on developing reasoning templates (that we call knowledge patterns) and using these patterns to systemically code the clinical guidelines for CHF as ASP rules. Use of the knowledge patterns greatly facilitates the development of our system. Given a patient’s medical information, our system generates a recommendation for treatment just as a human physician would, using the guidelines. Our system will work even in the presence of incomplete information. Our work makes two contributions: (i) it shows that highly complex guidelines can be successfully coded as ASP rules, and (ii) it develops a series of knowledge patterns that facilitate the coding of knowledge expressed in a natural language and that can be used for other application domains. This paper is under consideration for acceptance in TPLP. |
Tasks | |
Published | 2016-10-25 |
URL | http://arxiv.org/abs/1610.08115v1 |
http://arxiv.org/pdf/1610.08115v1.pdf | |
PWC | https://paperswithcode.com/paper/a-physician-advisory-system-for-chronic-heart |
Repo | |
Framework | |
Font Identification in Historical Documents Using Active Learning
Title | Font Identification in Historical Documents Using Active Learning |
Authors | Anshul Gupta, Ricardo Gutierrez-Osuna, Matthew Christy, Richard Furuta, Laura Mandell |
Abstract | Identifying the type of font (e.g., Roman, Blackletter) used in historical documents can help optical character recognition (OCR) systems produce more accurate text transcriptions. Towards this end, we present an active-learning strategy that can significantly reduce the number of labeled samples needed to train a font classifier. Our approach extracts image-based features that exploit geometric differences between fonts at the word level, and combines them into a bag-of-word representation for each page in a document. We evaluate six sampling strategies based on uncertainty, dissimilarity and diversity criteria, and test them on a database containing over 3,000 historical documents with Blackletter, Roman and Mixed fonts. Our results show that a combination of uncertainty and diversity achieves the highest predictive accuracy (89% of test cases correctly classified) while requiring only a small fraction of the data (17%) to be labeled. We discuss the implications of this result for mass digitization projects of historical documents. |
Tasks | Active Learning, Optical Character Recognition |
Published | 2016-01-27 |
URL | http://arxiv.org/abs/1601.07252v1 |
http://arxiv.org/pdf/1601.07252v1.pdf | |
PWC | https://paperswithcode.com/paper/font-identification-in-historical-documents |
Repo | |
Framework | |
Representation learning for very short texts using weighted word embedding aggregation
Title | Representation learning for very short texts using weighted word embedding aggregation |
Authors | Cedric De Boom, Steven Van Canneyt, Thomas Demeester, Bart Dhoedt |
Abstract | Short text messages such as tweets are very noisy and sparse in their use of vocabulary. Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications such as event detection, opinion mining, news recommendation, etc. We constructed a method based on semantic word embeddings and frequency information to arrive at low-dimensional representations for short texts designed to capture semantic similarity. For this purpose we designed a weight-based model and a learning procedure based on a novel median-based loss function. This paper discusses the details of our model and the optimization methods, together with the experimental results on both Wikipedia and Twitter data. We find that our method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining. Our method is therefore capable of retaining most of the semantic information in the text, and is applicable out-of-the-box. |
Tasks | Opinion Mining, Representation Learning, Semantic Similarity, Semantic Textual Similarity, Word Embeddings |
Published | 2016-07-02 |
URL | http://arxiv.org/abs/1607.00570v1 |
http://arxiv.org/pdf/1607.00570v1.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-for-very-short-texts |
Repo | |
Framework | |
Towards Abstraction from Extraction: Multiple Timescale Gated Recurrent Unit for Summarization
Title | Towards Abstraction from Extraction: Multiple Timescale Gated Recurrent Unit for Summarization |
Authors | Minsoo Kim, Moirangthem Dennis Singh, Minho Lee |
Abstract | In this work, we introduce temporal hierarchies to the sequence to sequence (seq2seq) model to tackle the problem of abstractive summarization of scientific articles. The proposed Multiple Timescale model of the Gated Recurrent Unit (MTGRU) is implemented in the encoder-decoder setting to better deal with the presence of multiple compositionalities in larger texts. The proposed model is compared to the conventional RNN encoder-decoder, and the results demonstrate that our model trains faster and shows significant performance gains. The results also show that the temporal hierarchies help improve the ability of seq2seq models to capture compositionalities better without the presence of highly complex architectural hierarchies. |
Tasks | Abstractive Text Summarization |
Published | 2016-07-04 |
URL | http://arxiv.org/abs/1607.00718v1 |
http://arxiv.org/pdf/1607.00718v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-abstraction-from-extraction-multiple |
Repo | |
Framework | |
Resource Constrained Structured Prediction
Title | Resource Constrained Structured Prediction |
Authors | Tolga Bolukbasi, Kai-Wei Chang, Joseph Wang, Venkatesh Saligrama |
Abstract | We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy. |
Tasks | Dependency Parsing, Optical Character Recognition, Structured Prediction |
Published | 2016-02-28 |
URL | http://arxiv.org/abs/1602.08761v2 |
http://arxiv.org/pdf/1602.08761v2.pdf | |
PWC | https://paperswithcode.com/paper/resource-constrained-structured-prediction |
Repo | |
Framework | |
Supervised multiview learning based on simultaneous learning of multiview intact and single view classifier
Title | Supervised multiview learning based on simultaneous learning of multiview intact and single view classifier |
Authors | Qingjun Wang, Haiyan Lv, Jun Yue, Eugene Mitchell |
Abstract | Multiview learning problem refers to the problem of learning a classifier from multiple view data. In this data set, each data points is presented by multiple different views. In this paper, we propose a novel method for this problem. This method is based on two assumptions. The first assumption is that each data point has an intact feature vector, and each view is obtained by a linear transformation from the intact vector. The second assumption is that the intact vectors are discriminative, and in the intact space, we have a linear classifier to separate the positive class from the negative class. We define an intact vector for each data point, and a view-conditional transformation matrix for each view, and propose to reconstruct the multiple view feature vectors by the product of the corresponding intact vectors and transformation matrices. Moreover, we also propose a linear classifier in the intact space, and learn it jointly with the intact vectors. The learning problem is modeled by a minimization problem, and the objective function is composed of a Cauchy error estimator-based view-conditional reconstruction term over all data points and views, and a classification error term measured by hinge loss over all the intact vectors of all the data points. Some regularization terms are also imposed to different variables in the objective function. The minimization problem is solve by an iterative algorithm using alternate optimization strategy and gradient descent algorithm. The proposed algorithm shows it advantage in the compression to other multiview learning algorithms on benchmark data sets. |
Tasks | Multiview Learning |
Published | 2016-01-09 |
URL | http://arxiv.org/abs/1601.02098v1 |
http://arxiv.org/pdf/1601.02098v1.pdf | |
PWC | https://paperswithcode.com/paper/supervised-multiview-learning-based-on |
Repo | |
Framework | |
Inducing Interpretable Representations with Variational Autoencoders
Title | Inducing Interpretable Representations with Variational Autoencoders |
Authors | N. Siddharth, Brooks Paige, Alban Desmaison, Jan-Willem Van de Meent, Frank Wood, Noah D. Goodman, Pushmeet Kohli, Philip H. S. Torr |
Abstract | We develop a framework for incorporating structured graphical models in the \emph{encoders} of variational autoencoders (VAEs) that allows us to induce interpretable representations through approximate variational inference. This allows us to both perform reasoning (e.g. classification) under the structural constraints of a given graphical model, and use deep generative models to deal with messy, high-dimensional domains where it is often difficult to model all the variation. Learning in this framework is carried out end-to-end with a variational objective, applying to both unsupervised and semi-supervised schemes. |
Tasks | |
Published | 2016-11-22 |
URL | http://arxiv.org/abs/1611.07492v1 |
http://arxiv.org/pdf/1611.07492v1.pdf | |
PWC | https://paperswithcode.com/paper/inducing-interpretable-representations-with |
Repo | |
Framework | |