October 21, 2019

3083 words 15 mins read

Paper Group AWR 77

Paper Group AWR 77

ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites. Not All Samples Are Created Equal: Deep Learning with Importance Sampling. GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination. Decoupled Networks. Gaussian Mixture Latent Vector Grammars. CGMH: Constrained Sentence Generation by Metropolis-Hastings …

ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites

Title ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites
Authors Jiankai Sun, Abhinav Vishnu, Aniket Chakrabarti, Charles Siegel, Srinivasan Parthasarathy
Abstract Routing questions in Community Question Answer services (CQAs) such as Stack Exchange sites is a well-studied problem. Yet, cold-start – a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We propose ColdRoute to address these challenges. ColdRoute is able to handle the task of routing cold questions posted by new or existing askers to matching experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach to well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (Precision$@1$, Accuracy, MRR) over the state-of-the-art models such as semantic matching by $159.5%$,$31.84%$, and $40.36%$ for cold questions posted by existing askers, and $123.1%$, $27.03%$, and $34.81%$ for cold questions posted by new askers respectively.
Tasks
Published 2018-07-02
URL http://arxiv.org/abs/1807.00462v1
PDF http://arxiv.org/pdf/1807.00462v1.pdf
PWC https://paperswithcode.com/paper/coldroute-effective-routing-of-cold-questions
Repo https://github.com/zhenv5/ColdRoute
Framework none

Not All Samples Are Created Equal: Deep Learning with Importance Sampling

Title Not All Samples Are Created Equal: Deep Learning with Importance Sampling
Authors Angelos Katharopoulos, François Fleuret
Abstract Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on “informative” examples, and reduces the variance of the stochastic gradients during training. Our contribution is twofold: first, we derive a tractable upper bound to the per-sample gradient norm, and second we derive an estimator of the variance reduction achieved with importance sampling, which enables us to switch it on when it will result in an actual speedup. The resulting scheme can be used by changing a few lines of code in a standard SGD procedure, and we demonstrate experimentally, on image classification, CNN fine-tuning, and RNN training, that for a fixed wall-clock time budget, it provides a reduction of the train losses of up to an order of magnitude and a relative improvement of test errors between 5% and 17%.
Tasks Image Classification
Published 2018-03-02
URL https://arxiv.org/abs/1803.00942v3
PDF https://arxiv.org/pdf/1803.00942v3.pdf
PWC https://paperswithcode.com/paper/not-all-samples-are-created-equal-deep
Repo https://github.com/idiap/importance-sampling
Framework tf

GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination

Title GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination
Authors Junyuan Shang, Cao Xiao, Tengfei Ma, Hongyan Li, Jimeng Sun
Abstract Recent progress in deep learning is revolutionizing the healthcare domain including providing solutions to medication recommendations, especially recommending medication combination for patients with complex health conditions. Existing approaches either do not customize based on patient health history, or ignore existing knowledge on drug-drug interactions (DDI) that might lead to adverse outcomes. To fill this gap, we propose the Graph Augmented Memory Networks (GAMENet), which integrates the drug-drug interactions knowledge graph by a memory module implemented as a graph convolutional networks, and models longitudinal patient records as the query. It is trained end-to-end to provide safe and personalized recommendation of medication combination. We demonstrate the effectiveness and safety of GAMENet by comparing with several state-of-the-art methods on real EHR data. GAMENet outperformed all baselines in all effectiveness measures, and also achieved 3.60% DDI rate reduction from existing EHR data.
Tasks
Published 2018-09-06
URL http://arxiv.org/abs/1809.01852v3
PDF http://arxiv.org/pdf/1809.01852v3.pdf
PWC https://paperswithcode.com/paper/gamenet-graph-augmented-memory-networks-for
Repo https://github.com/sjy1203/GAMENet
Framework pytorch

Decoupled Networks

Title Decoupled Networks
Authors Weiyang Liu, Zhen Liu, Zhiding Yu, Bo Dai, Rongmei Lin, Yisen Wang, James M. Rehg, Le Song
Abstract Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations. Inspired by the observation that CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference, we propose a generic decoupled learning framework which models the intra-class variation and semantic difference independently. Specifically, we first reparametrize the inner product to a decoupled form and then generalize it to the decoupled convolution operator which serves as the building block of our decoupled networks. We present several effective instances of the decoupled convolution operator. Each decoupled operator is well motivated and has an intuitive geometric interpretation. Based on these decoupled operators, we further propose to directly learn the operator from data. Extensive experiments show that such decoupled reparameterization renders significant performance gain with easier convergence and stronger robustness.
Tasks
Published 2018-04-22
URL http://arxiv.org/abs/1804.08071v1
PDF http://arxiv.org/pdf/1804.08071v1.pdf
PWC https://paperswithcode.com/paper/decoupled-networks
Repo https://github.com/yujiacheng333/BaseDcLayer
Framework tf

Gaussian Mixture Latent Vector Grammars

Title Gaussian Mixture Latent Vector Grammars
Authors Yanpeng Zhao, Liwen Zhang, Kewei Tu
Abstract We introduce Latent Vector Grammars (LVeGs), a new framework that extends latent variable grammars such that each nonterminal symbol is associated with a continuous vector space representing the set of (infinitely many) subtypes of the nonterminal. We show that previous models such as latent variable grammars and compositional vector grammars can be interpreted as special cases of LVeGs. We then present Gaussian Mixture LVeGs (GM-LVeGs), a new special case of LVeGs that uses Gaussian mixtures to formulate the weights of production rules over subtypes of nonterminals. A major advantage of using Gaussian mixtures is that the partition function and the expectations of subtype rules can be computed using an extension of the inside-outside algorithm, which enables efficient inference and learning. We apply GM-LVeGs to part-of-speech tagging and constituency parsing and show that GM-LVeGs can achieve competitive accuracies. Our code is available at https://github.com/zhaoyanpeng/lveg.
Tasks Constituency Parsing, Part-Of-Speech Tagging
Published 2018-05-12
URL http://arxiv.org/abs/1805.04688v1
PDF http://arxiv.org/pdf/1805.04688v1.pdf
PWC https://paperswithcode.com/paper/gaussian-mixture-latent-vector-grammars
Repo https://github.com/zhaoyanpeng/lveg
Framework none

CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling

Title CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling
Authors Ning Miao, Hao Zhou, Lili Mou, Rui Yan, Lei Li
Abstract In real-world applications of natural language generation, there are often constraints on the target sentences in addition to fluency and naturalness requirements. Existing language generation techniques are usually based on recurrent neural networks (RNNs). However, it is non-trivial to impose constraints on RNNs while maintaining generation quality, since RNNs generate sentences sequentially (or with beam search) from the first word to the last. In this paper, we propose CGMH, a novel approach using Metropolis-Hastings sampling for constrained sentence generation. CGMH allows complicated constraints such as the occurrence of multiple keywords in the target sentences, which cannot be handled in traditional RNN-based approaches. Moreover, CGMH works in the inference stage, and does not require parallel corpora for training. We evaluate our method on a variety of tasks, including keywords-to-sentence generation, unsupervised sentence paraphrasing, and unsupervised sentence error correction. CGMH achieves high performance compared with previous supervised methods for sentence generation. Our code is released at https://github.com/NingMiao/CGMH
Tasks Text Generation
Published 2018-11-14
URL http://arxiv.org/abs/1811.10996v1
PDF http://arxiv.org/pdf/1811.10996v1.pdf
PWC https://paperswithcode.com/paper/cgmh-constrained-sentence-generation-by
Repo https://github.com/NingMiao/CGMH
Framework tf

Autoencoders and Probabilistic Inference with Missing Data: An Exact Solution for The Factor Analysis Case

Title Autoencoders and Probabilistic Inference with Missing Data: An Exact Solution for The Factor Analysis Case
Authors Christopher K. I. Williams, Charlie Nash, Alfredo Nazábal
Abstract Latent variable models can be used to probabilistically “fill-in” missing data entries. The variational autoencoder architecture (Kingma and Welling, 2014; Rezende et al., 2014) includes a “recognition” or “encoder” network that infers the latent variables given the data variables. However, it is not clear how to handle missing data variables in this network. The factor analysis (FA) model is a basic autoencoder, using linear encoder and decoder networks. We show how to calculate exactly the latent posterior distribution for the factor analysis (FA) model in the presence of missing data, and note that this solution implies that a different encoder network is required for each pattern of missingness. We also discuss various approximations to the exact solution. Experiments compare the effectiveness of various approaches to filling in the missing data.
Tasks Latent Variable Models
Published 2018-01-11
URL http://arxiv.org/abs/1801.03851v3
PDF http://arxiv.org/pdf/1801.03851v3.pdf
PWC https://paperswithcode.com/paper/autoencoders-and-probabilistic-inference-with
Repo https://github.com/Kismuz/crypto_spread_test
Framework tf

COPA: Constrained PARAFAC2 for Sparse & Large Datasets

Title COPA: Constrained PARAFAC2 for Sparse & Large Datasets
Authors Ardavan Afshar, Ioakeim Perros, Evangelos E. Papalexakis, Elizabeth Searles, Joyce Ho, Jimeng Sun
Abstract PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite recent improvements on unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and non-negativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a {\it CO}nstrained {\it PA}RAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and non-negativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AO-ADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36 times faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy. Through a case study on temporal phenotyping of medically complex children, we demonstrate how the constraints imposed by COPA reveal concise phenotypes and meaningful temporal profiles of patients. The clinical interpretation of both the phenotypes and the temporal profiles was confirmed by a medical expert.
Tasks
Published 2018-03-12
URL http://arxiv.org/abs/1803.04572v2
PDF http://arxiv.org/pdf/1803.04572v2.pdf
PWC https://paperswithcode.com/paper/copa-constrained-parafac2-for-sparse-large
Repo https://github.com/aafshar/COPA
Framework none

Unsupervised Learning of Syntactic Structure with Invertible Neural Projections

Title Unsupervised Learning of Syntactic Structure with Invertible Neural Projections
Authors Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick
Abstract Unsupervised learning of syntactic structure is typically performed using generative models with discrete latent variables and multinomial parameters. In most cases, these models have not leveraged continuous word representations. In this work, we propose a novel generative model that jointly learns discrete syntactic structure and continuous word representations in an unsupervised fashion by cascading an invertible neural network with a structured generative prior. We show that the invertibility condition allows for efficient exact inference and marginal likelihood computation in our model so long as the prior is well-behaved. In experiments we instantiate our approach with both Markov and tree-structured priors, evaluating on two tasks: part-of-speech (POS) induction, and unsupervised dependency parsing without gold POS annotation. On the Penn Treebank, our Markov-structured model surpasses state-of-the-art results on POS induction. Similarly, we find that our tree-structured model achieves state-of-the-art performance on unsupervised dependency parsing for the difficult training condition where neither gold POS annotation nor punctuation-based constraints are available.
Tasks Constituency Grammar Induction, Dependency Parsing
Published 2018-08-28
URL http://arxiv.org/abs/1808.09111v1
PDF http://arxiv.org/pdf/1808.09111v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-of-syntactic-structure
Repo https://github.com/jxhe/struct-learning-with-flow
Framework pytorch

Training VAEs Under Structured Residuals

Title Training VAEs Under Structured Residuals
Authors Garoe Dorta, Sara Vicente, Lourdes Agapito, Neill D. F. Campbell, Ivor Simpson
Abstract Variational auto-encoders (VAEs) are a popular and powerful deep generative model. Previous works on VAEs have assumed a factorized likelihood model, whereby the output uncertainty of each pixel is assumed to be independent. This approximation is clearly limited as demonstrated by observing a residual image from a VAE reconstruction, which often possess a high level of structure. This paper demonstrates a novel scheme to incorporate a structured Gaussian likelihood prediction network within the VAE that allows the residual correlations to be modeled. Our novel architecture, with minimal increase in complexity, incorporates the covariance matrix prediction within the VAE. We also propose a new mechanism for allowing structured uncertainty on color images. Furthermore, we provide a scheme for effectively training this model, and include some suggestions for improving performance in terms of efficiency or modeling longer range correlations.
Tasks
Published 2018-04-03
URL http://arxiv.org/abs/1804.01050v3
PDF http://arxiv.org/pdf/1804.01050v3.pdf
PWC https://paperswithcode.com/paper/training-vaes-under-structured-residuals
Repo https://github.com/Garoe/tf_mvg
Framework tf

SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

Title SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos
Authors Silvio Giancola, Mohieddine Amine, Tarek Dghaily, Bernard Ghanem
Abstract In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\delta$ ranging from 5 to 60 seconds. Our dataset and models are available at https://silviogiancola.github.io/SoccerNet.
Tasks Action Classification, Action Detection, Action Spotting
Published 2018-04-12
URL http://arxiv.org/abs/1804.04527v2
PDF http://arxiv.org/pdf/1804.04527v2.pdf
PWC https://paperswithcode.com/paper/soccernet-a-scalable-dataset-for-action
Repo https://github.com/SilvioGiancola/SoccerNet-code
Framework tf

Restricted Boltzmann Machines: Introduction and Review

Title Restricted Boltzmann Machines: Introduction and Review
Authors Guido Montufar
Abstract The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is attractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation.
Tasks
Published 2018-06-19
URL http://arxiv.org/abs/1806.07066v1
PDF http://arxiv.org/pdf/1806.07066v1.pdf
PWC https://paperswithcode.com/paper/restricted-boltzmann-machines-introduction
Repo https://github.com/Kevin-Sean-Chen/Restriced_Boltzmann_Machine
Framework none

ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation

Title ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation
Authors Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Pérez
Abstract Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) entropy loss and (ii) adversarial loss respectively. We demonstrate state-of-the-art performance in semantic segmentation on two challenging “synthetic-2-real” set-ups and show that the approach can also be used for detection.
Tasks Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation
Published 2018-11-30
URL http://arxiv.org/abs/1811.12833v2
PDF http://arxiv.org/pdf/1811.12833v2.pdf
PWC https://paperswithcode.com/paper/advent-adversarial-entropy-minimization-for
Repo https://github.com/valeoai/ADVENT
Framework pytorch

Construction and Quality Evaluation of Heterogeneous Hierarchical Topic Models

Title Construction and Quality Evaluation of Heterogeneous Hierarchical Topic Models
Authors Anton Belyy
Abstract In our work, we propose to represent HTM as a set of flat models, or layers, and a set of topical hierarchies, or edges. We suggest several quality measures for edges of hierarchical models, resembling those proposed for flat models. We conduct an assessment experimentation and show strong correlation between the proposed measures and human judgement on topical edge quality. We also introduce heterogeneous algorithm to build hierarchical topic models for heterogeneous data sources. We show how making certain adjustments to learning process helps to retain original structure of customized models while allowing for slight coherent modifications for new documents. We evaluate this approach using the proposed measures and show that the proposed heterogeneous algorithm significantly outperforms the baseline concat approach. Finally, we implement our own ESE called Rysearch, which demonstrates the potential of ARTM approach for visualizing large heterogeneous document collections.
Tasks Topic Models
Published 2018-11-07
URL http://arxiv.org/abs/1811.02820v1
PDF http://arxiv.org/pdf/1811.02820v1.pdf
PWC https://paperswithcode.com/paper/construction-and-quality-evaluation-of
Repo https://github.com/AVBelyy/Rysearch
Framework none

Elastic Gossip: Distributing Neural Network Training Using Gossip-like Protocols

Title Elastic Gossip: Distributing Neural Network Training Using Gossip-like Protocols
Authors Siddharth Pramod
Abstract Distributing Neural Network training is of particular interest for several reasons including scaling using computing clusters, training at data sources such as IOT devices and edge servers, utilizing underutilized resources across heterogeneous environments, and so on. Most contemporary approaches primarily address scaling using computing clusters and require high network bandwidth and frequent communication. This thesis presents an overview of standard approaches to distribute training and proposes a novel technique involving pairwise-communication using Gossip-like protocols, called Elastic Gossip. This approach builds upon an existing technique known as Elastic Averaging SGD (EASGD), and is similar to another technique called Gossiping SGD which also uses Gossip-like protocols. Elastic Gossip is empirically evaluated against Gossiping SGD using the MNIST digit recognition and CIFAR-10 classification tasks, using commonly used Neural Network architectures spanning Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). It is found that Elastic Gossip, Gossiping SGD, and All-reduce SGD perform quite comparably, even though the latter entails a substantially higher communication cost. While Elastic Gossip performs better than Gossiping SGD in these experiments, it is possible that a more thorough search over hyper-parameter space, specific to a given application, may yield configurations of Gossiping SGD that work better than Elastic Gossip.
Tasks
Published 2018-12-06
URL http://arxiv.org/abs/1812.02407v1
PDF http://arxiv.org/pdf/1812.02407v1.pdf
PWC https://paperswithcode.com/paper/elastic-gossip-distributing-neural-network
Repo https://github.com/sidps/dist_training
Framework pytorch
comments powered by Disqus