Paper Group AWR 77
ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites. Not All Samples Are Created Equal: Deep Learning with Importance Sampling. GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination. Decoupled Networks. Gaussian Mixture Latent Vector Grammars. CGMH: Constrained Sentence Generation by Metropolis-Hastings …
ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites
Title | ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites |
Authors | Jiankai Sun, Abhinav Vishnu, Aniket Chakrabarti, Charles Siegel, Srinivasan Parthasarathy |
Abstract | Routing questions in Community Question Answer services (CQAs) such as Stack Exchange sites is a well-studied problem. Yet, cold-start – a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We propose ColdRoute to address these challenges. ColdRoute is able to handle the task of routing cold questions posted by new or existing askers to matching experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach to well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (Precision$@1$, Accuracy, MRR) over the state-of-the-art models such as semantic matching by $159.5%$,$31.84%$, and $40.36%$ for cold questions posted by existing askers, and $123.1%$, $27.03%$, and $34.81%$ for cold questions posted by new askers respectively. |
Tasks | |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00462v1 |
http://arxiv.org/pdf/1807.00462v1.pdf | |
PWC | https://paperswithcode.com/paper/coldroute-effective-routing-of-cold-questions |
Repo | https://github.com/zhenv5/ColdRoute |
Framework | none |
Not All Samples Are Created Equal: Deep Learning with Importance Sampling
Title | Not All Samples Are Created Equal: Deep Learning with Importance Sampling |
Authors | Angelos Katharopoulos, François Fleuret |
Abstract | Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on “informative” examples, and reduces the variance of the stochastic gradients during training. Our contribution is twofold: first, we derive a tractable upper bound to the per-sample gradient norm, and second we derive an estimator of the variance reduction achieved with importance sampling, which enables us to switch it on when it will result in an actual speedup. The resulting scheme can be used by changing a few lines of code in a standard SGD procedure, and we demonstrate experimentally, on image classification, CNN fine-tuning, and RNN training, that for a fixed wall-clock time budget, it provides a reduction of the train losses of up to an order of magnitude and a relative improvement of test errors between 5% and 17%. |
Tasks | Image Classification |
Published | 2018-03-02 |
URL | https://arxiv.org/abs/1803.00942v3 |
https://arxiv.org/pdf/1803.00942v3.pdf | |
PWC | https://paperswithcode.com/paper/not-all-samples-are-created-equal-deep |
Repo | https://github.com/idiap/importance-sampling |
Framework | tf |
GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination
Title | GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination |
Authors | Junyuan Shang, Cao Xiao, Tengfei Ma, Hongyan Li, Jimeng Sun |
Abstract | Recent progress in deep learning is revolutionizing the healthcare domain including providing solutions to medication recommendations, especially recommending medication combination for patients with complex health conditions. Existing approaches either do not customize based on patient health history, or ignore existing knowledge on drug-drug interactions (DDI) that might lead to adverse outcomes. To fill this gap, we propose the Graph Augmented Memory Networks (GAMENet), which integrates the drug-drug interactions knowledge graph by a memory module implemented as a graph convolutional networks, and models longitudinal patient records as the query. It is trained end-to-end to provide safe and personalized recommendation of medication combination. We demonstrate the effectiveness and safety of GAMENet by comparing with several state-of-the-art methods on real EHR data. GAMENet outperformed all baselines in all effectiveness measures, and also achieved 3.60% DDI rate reduction from existing EHR data. |
Tasks | |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.01852v3 |
http://arxiv.org/pdf/1809.01852v3.pdf | |
PWC | https://paperswithcode.com/paper/gamenet-graph-augmented-memory-networks-for |
Repo | https://github.com/sjy1203/GAMENet |
Framework | pytorch |
Decoupled Networks
Title | Decoupled Networks |
Authors | Weiyang Liu, Zhen Liu, Zhiding Yu, Bo Dai, Rongmei Lin, Yisen Wang, James M. Rehg, Le Song |
Abstract | Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations. Inspired by the observation that CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference, we propose a generic decoupled learning framework which models the intra-class variation and semantic difference independently. Specifically, we first reparametrize the inner product to a decoupled form and then generalize it to the decoupled convolution operator which serves as the building block of our decoupled networks. We present several effective instances of the decoupled convolution operator. Each decoupled operator is well motivated and has an intuitive geometric interpretation. Based on these decoupled operators, we further propose to directly learn the operator from data. Extensive experiments show that such decoupled reparameterization renders significant performance gain with easier convergence and stronger robustness. |
Tasks | |
Published | 2018-04-22 |
URL | http://arxiv.org/abs/1804.08071v1 |
http://arxiv.org/pdf/1804.08071v1.pdf | |
PWC | https://paperswithcode.com/paper/decoupled-networks |
Repo | https://github.com/yujiacheng333/BaseDcLayer |
Framework | tf |
Gaussian Mixture Latent Vector Grammars
Title | Gaussian Mixture Latent Vector Grammars |
Authors | Yanpeng Zhao, Liwen Zhang, Kewei Tu |
Abstract | We introduce Latent Vector Grammars (LVeGs), a new framework that extends latent variable grammars such that each nonterminal symbol is associated with a continuous vector space representing the set of (infinitely many) subtypes of the nonterminal. We show that previous models such as latent variable grammars and compositional vector grammars can be interpreted as special cases of LVeGs. We then present Gaussian Mixture LVeGs (GM-LVeGs), a new special case of LVeGs that uses Gaussian mixtures to formulate the weights of production rules over subtypes of nonterminals. A major advantage of using Gaussian mixtures is that the partition function and the expectations of subtype rules can be computed using an extension of the inside-outside algorithm, which enables efficient inference and learning. We apply GM-LVeGs to part-of-speech tagging and constituency parsing and show that GM-LVeGs can achieve competitive accuracies. Our code is available at https://github.com/zhaoyanpeng/lveg. |
Tasks | Constituency Parsing, Part-Of-Speech Tagging |
Published | 2018-05-12 |
URL | http://arxiv.org/abs/1805.04688v1 |
http://arxiv.org/pdf/1805.04688v1.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-mixture-latent-vector-grammars |
Repo | https://github.com/zhaoyanpeng/lveg |
Framework | none |
CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling
Title | CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling |
Authors | Ning Miao, Hao Zhou, Lili Mou, Rui Yan, Lei Li |
Abstract | In real-world applications of natural language generation, there are often constraints on the target sentences in addition to fluency and naturalness requirements. Existing language generation techniques are usually based on recurrent neural networks (RNNs). However, it is non-trivial to impose constraints on RNNs while maintaining generation quality, since RNNs generate sentences sequentially (or with beam search) from the first word to the last. In this paper, we propose CGMH, a novel approach using Metropolis-Hastings sampling for constrained sentence generation. CGMH allows complicated constraints such as the occurrence of multiple keywords in the target sentences, which cannot be handled in traditional RNN-based approaches. Moreover, CGMH works in the inference stage, and does not require parallel corpora for training. We evaluate our method on a variety of tasks, including keywords-to-sentence generation, unsupervised sentence paraphrasing, and unsupervised sentence error correction. CGMH achieves high performance compared with previous supervised methods for sentence generation. Our code is released at https://github.com/NingMiao/CGMH |
Tasks | Text Generation |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.10996v1 |
http://arxiv.org/pdf/1811.10996v1.pdf | |
PWC | https://paperswithcode.com/paper/cgmh-constrained-sentence-generation-by |
Repo | https://github.com/NingMiao/CGMH |
Framework | tf |
Autoencoders and Probabilistic Inference with Missing Data: An Exact Solution for The Factor Analysis Case
Title | Autoencoders and Probabilistic Inference with Missing Data: An Exact Solution for The Factor Analysis Case |
Authors | Christopher K. I. Williams, Charlie Nash, Alfredo Nazábal |
Abstract | Latent variable models can be used to probabilistically “fill-in” missing data entries. The variational autoencoder architecture (Kingma and Welling, 2014; Rezende et al., 2014) includes a “recognition” or “encoder” network that infers the latent variables given the data variables. However, it is not clear how to handle missing data variables in this network. The factor analysis (FA) model is a basic autoencoder, using linear encoder and decoder networks. We show how to calculate exactly the latent posterior distribution for the factor analysis (FA) model in the presence of missing data, and note that this solution implies that a different encoder network is required for each pattern of missingness. We also discuss various approximations to the exact solution. Experiments compare the effectiveness of various approaches to filling in the missing data. |
Tasks | Latent Variable Models |
Published | 2018-01-11 |
URL | http://arxiv.org/abs/1801.03851v3 |
http://arxiv.org/pdf/1801.03851v3.pdf | |
PWC | https://paperswithcode.com/paper/autoencoders-and-probabilistic-inference-with |
Repo | https://github.com/Kismuz/crypto_spread_test |
Framework | tf |
COPA: Constrained PARAFAC2 for Sparse & Large Datasets
Title | COPA: Constrained PARAFAC2 for Sparse & Large Datasets |
Authors | Ardavan Afshar, Ioakeim Perros, Evangelos E. Papalexakis, Elizabeth Searles, Joyce Ho, Jimeng Sun |
Abstract | PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite recent improvements on unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and non-negativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a {\it CO}nstrained {\it PA}RAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and non-negativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AO-ADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36 times faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy. Through a case study on temporal phenotyping of medically complex children, we demonstrate how the constraints imposed by COPA reveal concise phenotypes and meaningful temporal profiles of patients. The clinical interpretation of both the phenotypes and the temporal profiles was confirmed by a medical expert. |
Tasks | |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04572v2 |
http://arxiv.org/pdf/1803.04572v2.pdf | |
PWC | https://paperswithcode.com/paper/copa-constrained-parafac2-for-sparse-large |
Repo | https://github.com/aafshar/COPA |
Framework | none |
Unsupervised Learning of Syntactic Structure with Invertible Neural Projections
Title | Unsupervised Learning of Syntactic Structure with Invertible Neural Projections |
Authors | Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick |
Abstract | Unsupervised learning of syntactic structure is typically performed using generative models with discrete latent variables and multinomial parameters. In most cases, these models have not leveraged continuous word representations. In this work, we propose a novel generative model that jointly learns discrete syntactic structure and continuous word representations in an unsupervised fashion by cascading an invertible neural network with a structured generative prior. We show that the invertibility condition allows for efficient exact inference and marginal likelihood computation in our model so long as the prior is well-behaved. In experiments we instantiate our approach with both Markov and tree-structured priors, evaluating on two tasks: part-of-speech (POS) induction, and unsupervised dependency parsing without gold POS annotation. On the Penn Treebank, our Markov-structured model surpasses state-of-the-art results on POS induction. Similarly, we find that our tree-structured model achieves state-of-the-art performance on unsupervised dependency parsing for the difficult training condition where neither gold POS annotation nor punctuation-based constraints are available. |
Tasks | Constituency Grammar Induction, Dependency Parsing |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09111v1 |
http://arxiv.org/pdf/1808.09111v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-syntactic-structure |
Repo | https://github.com/jxhe/struct-learning-with-flow |
Framework | pytorch |
Training VAEs Under Structured Residuals
Title | Training VAEs Under Structured Residuals |
Authors | Garoe Dorta, Sara Vicente, Lourdes Agapito, Neill D. F. Campbell, Ivor Simpson |
Abstract | Variational auto-encoders (VAEs) are a popular and powerful deep generative model. Previous works on VAEs have assumed a factorized likelihood model, whereby the output uncertainty of each pixel is assumed to be independent. This approximation is clearly limited as demonstrated by observing a residual image from a VAE reconstruction, which often possess a high level of structure. This paper demonstrates a novel scheme to incorporate a structured Gaussian likelihood prediction network within the VAE that allows the residual correlations to be modeled. Our novel architecture, with minimal increase in complexity, incorporates the covariance matrix prediction within the VAE. We also propose a new mechanism for allowing structured uncertainty on color images. Furthermore, we provide a scheme for effectively training this model, and include some suggestions for improving performance in terms of efficiency or modeling longer range correlations. |
Tasks | |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.01050v3 |
http://arxiv.org/pdf/1804.01050v3.pdf | |
PWC | https://paperswithcode.com/paper/training-vaes-under-structured-residuals |
Repo | https://github.com/Garoe/tf_mvg |
Framework | tf |
SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos
Title | SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos |
Authors | Silvio Giancola, Mohieddine Amine, Tarek Dghaily, Bernard Ghanem |
Abstract | In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\delta$ ranging from 5 to 60 seconds. Our dataset and models are available at https://silviogiancola.github.io/SoccerNet. |
Tasks | Action Classification, Action Detection, Action Spotting |
Published | 2018-04-12 |
URL | http://arxiv.org/abs/1804.04527v2 |
http://arxiv.org/pdf/1804.04527v2.pdf | |
PWC | https://paperswithcode.com/paper/soccernet-a-scalable-dataset-for-action |
Repo | https://github.com/SilvioGiancola/SoccerNet-code |
Framework | tf |
Restricted Boltzmann Machines: Introduction and Review
Title | Restricted Boltzmann Machines: Introduction and Review |
Authors | Guido Montufar |
Abstract | The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is attractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation. |
Tasks | |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07066v1 |
http://arxiv.org/pdf/1806.07066v1.pdf | |
PWC | https://paperswithcode.com/paper/restricted-boltzmann-machines-introduction |
Repo | https://github.com/Kevin-Sean-Chen/Restriced_Boltzmann_Machine |
Framework | none |
ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation
Title | ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation |
Authors | Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Pérez |
Abstract | Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) entropy loss and (ii) adversarial loss respectively. We demonstrate state-of-the-art performance in semantic segmentation on two challenging “synthetic-2-real” set-ups and show that the approach can also be used for detection. |
Tasks | Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1811.12833v2 |
http://arxiv.org/pdf/1811.12833v2.pdf | |
PWC | https://paperswithcode.com/paper/advent-adversarial-entropy-minimization-for |
Repo | https://github.com/valeoai/ADVENT |
Framework | pytorch |
Construction and Quality Evaluation of Heterogeneous Hierarchical Topic Models
Title | Construction and Quality Evaluation of Heterogeneous Hierarchical Topic Models |
Authors | Anton Belyy |
Abstract | In our work, we propose to represent HTM as a set of flat models, or layers, and a set of topical hierarchies, or edges. We suggest several quality measures for edges of hierarchical models, resembling those proposed for flat models. We conduct an assessment experimentation and show strong correlation between the proposed measures and human judgement on topical edge quality. We also introduce heterogeneous algorithm to build hierarchical topic models for heterogeneous data sources. We show how making certain adjustments to learning process helps to retain original structure of customized models while allowing for slight coherent modifications for new documents. We evaluate this approach using the proposed measures and show that the proposed heterogeneous algorithm significantly outperforms the baseline concat approach. Finally, we implement our own ESE called Rysearch, which demonstrates the potential of ARTM approach for visualizing large heterogeneous document collections. |
Tasks | Topic Models |
Published | 2018-11-07 |
URL | http://arxiv.org/abs/1811.02820v1 |
http://arxiv.org/pdf/1811.02820v1.pdf | |
PWC | https://paperswithcode.com/paper/construction-and-quality-evaluation-of |
Repo | https://github.com/AVBelyy/Rysearch |
Framework | none |
Elastic Gossip: Distributing Neural Network Training Using Gossip-like Protocols
Title | Elastic Gossip: Distributing Neural Network Training Using Gossip-like Protocols |
Authors | Siddharth Pramod |
Abstract | Distributing Neural Network training is of particular interest for several reasons including scaling using computing clusters, training at data sources such as IOT devices and edge servers, utilizing underutilized resources across heterogeneous environments, and so on. Most contemporary approaches primarily address scaling using computing clusters and require high network bandwidth and frequent communication. This thesis presents an overview of standard approaches to distribute training and proposes a novel technique involving pairwise-communication using Gossip-like protocols, called Elastic Gossip. This approach builds upon an existing technique known as Elastic Averaging SGD (EASGD), and is similar to another technique called Gossiping SGD which also uses Gossip-like protocols. Elastic Gossip is empirically evaluated against Gossiping SGD using the MNIST digit recognition and CIFAR-10 classification tasks, using commonly used Neural Network architectures spanning Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). It is found that Elastic Gossip, Gossiping SGD, and All-reduce SGD perform quite comparably, even though the latter entails a substantially higher communication cost. While Elastic Gossip performs better than Gossiping SGD in these experiments, it is possible that a more thorough search over hyper-parameter space, specific to a given application, may yield configurations of Gossiping SGD that work better than Elastic Gossip. |
Tasks | |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02407v1 |
http://arxiv.org/pdf/1812.02407v1.pdf | |
PWC | https://paperswithcode.com/paper/elastic-gossip-distributing-neural-network |
Repo | https://github.com/sidps/dist_training |
Framework | pytorch |