October 21, 2019

3083 words 15 mins read

Paper Group AWR 77

ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites. Not All Samples Are Created Equal: Deep Learning with Importance Sampling. GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination. Decoupled Networks. Gaussian Mixture Latent Vector Grammars. CGMH: Constrained Sentence Generation by Metropolis-Hastings …

ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites


Title	ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites
Authors	Jiankai Sun, Abhinav Vishnu, Aniket Chakrabarti, Charles Siegel, Srinivasan Parthasarathy
Abstract	Routing questions in Community Question Answer services (CQAs) such as Stack Exchange sites is a well-studied problem. Yet, cold-start – a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We propose ColdRoute to address these challenges. ColdRoute is able to handle the task of routing cold questions posted by new or existing askers to matching experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach to well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (Precision$@1$, Accuracy, MRR) over the state-of-the-art models such as semantic matching by $159.5%$,$31.84%$, and $40.36%$ for cold questions posted by existing askers, and $123.1%$, $27.03%$, and $34.81%$ for cold questions posted by new askers respectively.
Tasks
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00462v1
PDF	http://arxiv.org/pdf/1807.00462v1.pdf
PWC	https://paperswithcode.com/paper/coldroute-effective-routing-of-cold-questions
Repo	https://github.com/zhenv5/ColdRoute
Framework	none

Not All Samples Are Created Equal: Deep Learning with Importance Sampling


Title	Not All Samples Are Created Equal: Deep Learning with Importance Sampling
Authors	Angelos Katharopoulos, François Fleuret
Abstract	Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on “informative” examples, and reduces the variance of the stochastic gradients during training. Our contribution is twofold: first, we derive a tractable upper bound to the per-sample gradient norm, and second we derive an estimator of the variance reduction achieved with importance sampling, which enables us to switch it on when it will result in an actual speedup. The resulting scheme can be used by changing a few lines of code in a standard SGD procedure, and we demonstrate experimentally, on image classification, CNN fine-tuning, and RNN training, that for a fixed wall-clock time budget, it provides a reduction of the train losses of up to an order of magnitude and a relative improvement of test errors between 5% and 17%.
Tasks	Image Classification
Published	2018-03-02
URL	https://arxiv.org/abs/1803.00942v3
PDF	https://arxiv.org/pdf/1803.00942v3.pdf
PWC	https://paperswithcode.com/paper/not-all-samples-are-created-equal-deep
Repo	https://github.com/idiap/importance-sampling
Framework	tf

GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination


Title	GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination
Authors	Junyuan Shang, Cao Xiao, Tengfei Ma, Hongyan Li, Jimeng Sun
Abstract	Recent progress in deep learning is revolutionizing the healthcare domain including providing solutions to medication recommendations, especially recommending medication combination for patients with complex health conditions. Existing approaches either do not customize based on patient health history, or ignore existing knowledge on drug-drug interactions (DDI) that might lead to adverse outcomes. To fill this gap, we propose the Graph Augmented Memory Networks (GAMENet), which integrates the drug-drug interactions knowledge graph by a memory module implemented as a graph convolutional networks, and models longitudinal patient records as the query. It is trained end-to-end to provide safe and personalized recommendation of medication combination. We demonstrate the effectiveness and safety of GAMENet by comparing with several state-of-the-art methods on real EHR data. GAMENet outperformed all baselines in all effectiveness measures, and also achieved 3.60% DDI rate reduction from existing EHR data.
Tasks
Published	2018-09-06
URL	http://arxiv.org/abs/1809.01852v3
PDF	http://arxiv.org/pdf/1809.01852v3.pdf
PWC	https://paperswithcode.com/paper/gamenet-graph-augmented-memory-networks-for
Repo	https://github.com/sjy1203/GAMENet
Framework	pytorch

Decoupled Networks


Title	Decoupled Networks
Authors	Weiyang Liu, Zhen Liu, Zhiding Yu, Bo Dai, Rongmei Lin, Yisen Wang, James M. Rehg, Le Song
Abstract	Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations. Inspired by the observation that CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference, we propose a generic decoupled learning framework which models the intra-class variation and semantic difference independently. Specifically, we first reparametrize the inner product to a decoupled form and then generalize it to the decoupled convolution operator which serves as the building block of our decoupled networks. We present several effective instances of the decoupled convolution operator. Each decoupled operator is well motivated and has an intuitive geometric interpretation. Based on these decoupled operators, we further propose to directly learn the operator from data. Extensive experiments show that such decoupled reparameterization renders significant performance gain with easier convergence and stronger robustness.
Tasks
Published	2018-04-22
URL	http://arxiv.org/abs/1804.08071v1
PDF	http://arxiv.org/pdf/1804.08071v1.pdf
PWC	https://paperswithcode.com/paper/decoupled-networks
Repo	https://github.com/yujiacheng333/BaseDcLayer
Framework	tf

Gaussian Mixture Latent Vector Grammars


Title	Gaussian Mixture Latent Vector Grammars
Authors	Yanpeng Zhao, Liwen Zhang, Kewei Tu
Abstract	We introduce Latent Vector Grammars (LVeGs), a new framework that extends latent variable grammars such that each nonterminal symbol is associated with a continuous vector space representing the set of (infinitely many) subtypes of the nonterminal. We show that previous models such as latent variable grammars and compositional vector grammars can be interpreted as special cases of LVeGs. We then present Gaussian Mixture LVeGs (GM-LVeGs), a new special case of LVeGs that uses Gaussian mixtures to formulate the weights of production rules over subtypes of nonterminals. A major advantage of using Gaussian mixtures is that the partition function and the expectations of subtype rules can be computed using an extension of the inside-outside algorithm, which enables efficient inference and learning. We apply GM-LVeGs to part-of-speech tagging and constituency parsing and show that GM-LVeGs can achieve competitive accuracies. Our code is available at https://github.com/zhaoyanpeng/lveg.
Tasks	Constituency Parsing, Part-Of-Speech Tagging
Published	2018-05-12
URL	http://arxiv.org/abs/1805.04688v1
PDF	http://arxiv.org/pdf/1805.04688v1.pdf
PWC	https://paperswithcode.com/paper/gaussian-mixture-latent-vector-grammars
Repo	https://github.com/zhaoyanpeng/lveg
Framework	none

CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling


Title	CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling
Authors	Ning Miao, Hao Zhou, Lili Mou, Rui Yan, Lei Li
Abstract	In real-world applications of natural language generation, there are often constraints on the target sentences in addition to fluency and naturalness requirements. Existing language generation techniques are usually based on recurrent neural networks (RNNs). However, it is non-trivial to impose constraints on RNNs while maintaining generation quality, since RNNs generate sentences sequentially (or with beam search) from the first word to the last. In this paper, we propose CGMH, a novel approach using Metropolis-Hastings sampling for constrained sentence generation. CGMH allows complicated constraints such as the occurrence of multiple keywords in the target sentences, which cannot be handled in traditional RNN-based approaches. Moreover, CGMH works in the inference stage, and does not require parallel corpora for training. We evaluate our method on a variety of tasks, including keywords-to-sentence generation, unsupervised sentence paraphrasing, and unsupervised sentence error correction. CGMH achieves high performance compared with previous supervised methods for sentence generation. Our code is released at https://github.com/NingMiao/CGMH
Tasks	Text Generation
Published	2018-11-14
URL	http://arxiv.org/abs/1811.10996v1
PDF	http://arxiv.org/pdf/1811.10996v1.pdf
PWC	https://paperswithcode.com/paper/cgmh-constrained-sentence-generation-by
Repo	https://github.com/NingMiao/CGMH
Framework	tf

Autoencoders and Probabilistic Inference with Missing Data: An Exact Solution for The Factor Analysis Case


Title	Autoencoders and Probabilistic Inference with Missing Data: An Exact Solution for The Factor Analysis Case
Authors	Christopher K. I. Williams, Charlie Nash, Alfredo Nazábal
Abstract	Latent variable models can be used to probabilistically “fill-in” missing data entries. The variational autoencoder architecture (Kingma and Welling, 2014; Rezende et al., 2014) includes a “recognition” or “encoder” network that infers the latent variables given the data variables. However, it is not clear how to handle missing data variables in this network. The factor analysis (FA) model is a basic autoencoder, using linear encoder and decoder networks. We show how to calculate exactly the latent posterior distribution for the factor analysis (FA) model in the presence of missing data, and note that this solution implies that a different encoder network is required for each pattern of missingness. We also discuss various approximations to the exact solution. Experiments compare the effectiveness of various approaches to filling in the missing data.
Tasks	Latent Variable Models
Published	2018-01-11
URL	http://arxiv.org/abs/1801.03851v3
PDF	http://arxiv.org/pdf/1801.03851v3.pdf
PWC	https://paperswithcode.com/paper/autoencoders-and-probabilistic-inference-with
Repo	https://github.com/Kismuz/crypto_spread_test
Framework	tf

COPA: Constrained PARAFAC2 for Sparse & Large Datasets


Title	COPA: Constrained PARAFAC2 for Sparse & Large Datasets
Authors	Ardavan Afshar, Ioakeim Perros, Evangelos E. Papalexakis, Elizabeth Searles, Joyce Ho, Jimeng Sun
Abstract	PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite recent improvements on unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and non-negativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a {\it CO}nstrained {\it PA}RAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and non-negativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AO-ADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36 times faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy. Through a case study on temporal phenotyping of medically complex children, we demonstrate how the constraints imposed by COPA reveal concise phenotypes and meaningful temporal profiles of patients. The clinical interpretation of both the phenotypes and the temporal profiles was confirmed by a medical expert.
Tasks
Published	2018-03-12
URL	http://arxiv.org/abs/1803.04572v2
PDF	http://arxiv.org/pdf/1803.04572v2.pdf
PWC	https://paperswithcode.com/paper/copa-constrained-parafac2-for-sparse-large
Repo	https://github.com/aafshar/COPA
Framework	none

Unsupervised Learning of Syntactic Structure with Invertible Neural Projections


Title	Unsupervised Learning of Syntactic Structure with Invertible Neural Projections
Authors	Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick
Abstract	Unsupervised learning of syntactic structure is typically performed using generative models with discrete latent variables and multinomial parameters. In most cases, these models have not leveraged continuous word representations. In this work, we propose a novel generative model that jointly learns discrete syntactic structure and continuous word representations in an unsupervised fashion by cascading an invertible neural network with a structured generative prior. We show that the invertibility condition allows for efficient exact inference and marginal likelihood computation in our model so long as the prior is well-behaved. In experiments we instantiate our approach with both Markov and tree-structured priors, evaluating on two tasks: part-of-speech (POS) induction, and unsupervised dependency parsing without gold POS annotation. On the Penn Treebank, our Markov-structured model surpasses state-of-the-art results on POS induction. Similarly, we find that our tree-structured model achieves state-of-the-art performance on unsupervised dependency parsing for the difficult training condition where neither gold POS annotation nor punctuation-based constraints are available.
Tasks	Constituency Grammar Induction, Dependency Parsing
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09111v1
PDF	http://arxiv.org/pdf/1808.09111v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-syntactic-structure
Repo	https://github.com/jxhe/struct-learning-with-flow
Framework	pytorch

Training VAEs Under Structured Residuals


Title	Training VAEs Under Structured Residuals
Authors	Garoe Dorta, Sara Vicente, Lourdes Agapito, Neill D. F. Campbell, Ivor Simpson
Abstract	Variational auto-encoders (VAEs) are a popular and powerful deep generative model. Previous works on VAEs have assumed a factorized likelihood model, whereby the output uncertainty of each pixel is assumed to be independent. This approximation is clearly limited as demonstrated by observing a residual image from a VAE reconstruction, which often possess a high level of structure. This paper demonstrates a novel scheme to incorporate a structured Gaussian likelihood prediction network within the VAE that allows the residual correlations to be modeled. Our novel architecture, with minimal increase in complexity, incorporates the covariance matrix prediction within the VAE. We also propose a new mechanism for allowing structured uncertainty on color images. Furthermore, we provide a scheme for effectively training this model, and include some suggestions for improving performance in terms of efficiency or modeling longer range correlations.
Tasks
Published	2018-04-03
URL	http://arxiv.org/abs/1804.01050v3
PDF	http://arxiv.org/pdf/1804.01050v3.pdf
PWC	https://paperswithcode.com/paper/training-vaes-under-structured-residuals
Repo	https://github.com/Garoe/tf_mvg
Framework	tf

SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos


Title	SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos
Authors	Silvio Giancola, Mohieddine Amine, Tarek Dghaily, Bernard Ghanem
Abstract	In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\delta$ ranging from 5 to 60 seconds. Our dataset and models are available at https://silviogiancola.github.io/SoccerNet.
Tasks	Action Classification, Action Detection, Action Spotting
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04527v2
PDF	http://arxiv.org/pdf/1804.04527v2.pdf
PWC	https://paperswithcode.com/paper/soccernet-a-scalable-dataset-for-action
Repo	https://github.com/SilvioGiancola/SoccerNet-code
Framework	tf

Restricted Boltzmann Machines: Introduction and Review


Title	Restricted Boltzmann Machines: Introduction and Review
Authors	Guido Montufar
Abstract	The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is attractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation.
Tasks
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07066v1
PDF	http://arxiv.org/pdf/1806.07066v1.pdf
PWC	https://paperswithcode.com/paper/restricted-boltzmann-machines-introduction
Repo	https://github.com/Kevin-Sean-Chen/Restriced_Boltzmann_Machine
Framework	none

ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation


Title	ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation
Authors	Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Pérez
Abstract	Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) entropy loss and (ii) adversarial loss respectively. We demonstrate state-of-the-art performance in semantic segmentation on two challenging “synthetic-2-real” set-ups and show that the approach can also be used for detection.
Tasks	Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation
Published	2018-11-30
URL	http://arxiv.org/abs/1811.12833v2
PDF	http://arxiv.org/pdf/1811.12833v2.pdf
PWC	https://paperswithcode.com/paper/advent-adversarial-entropy-minimization-for
Repo	https://github.com/valeoai/ADVENT
Framework	pytorch

Construction and Quality Evaluation of Heterogeneous Hierarchical Topic Models


Title	Construction and Quality Evaluation of Heterogeneous Hierarchical Topic Models
Authors	Anton Belyy
Abstract	In our work, we propose to represent HTM as a set of flat models, or layers, and a set of topical hierarchies, or edges. We suggest several quality measures for edges of hierarchical models, resembling those proposed for flat models. We conduct an assessment experimentation and show strong correlation between the proposed measures and human judgement on topical edge quality. We also introduce heterogeneous algorithm to build hierarchical topic models for heterogeneous data sources. We show how making certain adjustments to learning process helps to retain original structure of customized models while allowing for slight coherent modifications for new documents. We evaluate this approach using the proposed measures and show that the proposed heterogeneous algorithm significantly outperforms the baseline concat approach. Finally, we implement our own ESE called Rysearch, which demonstrates the potential of ARTM approach for visualizing large heterogeneous document collections.
Tasks	Topic Models
Published	2018-11-07
URL	http://arxiv.org/abs/1811.02820v1
PDF	http://arxiv.org/pdf/1811.02820v1.pdf
PWC	https://paperswithcode.com/paper/construction-and-quality-evaluation-of
Repo	https://github.com/AVBelyy/Rysearch
Framework	none

Elastic Gossip: Distributing Neural Network Training Using Gossip-like Protocols


Title	Elastic Gossip: Distributing Neural Network Training Using Gossip-like Protocols
Authors	Siddharth Pramod
Abstract	Distributing Neural Network training is of particular interest for several reasons including scaling using computing clusters, training at data sources such as IOT devices and edge servers, utilizing underutilized resources across heterogeneous environments, and so on. Most contemporary approaches primarily address scaling using computing clusters and require high network bandwidth and frequent communication. This thesis presents an overview of standard approaches to distribute training and proposes a novel technique involving pairwise-communication using Gossip-like protocols, called Elastic Gossip. This approach builds upon an existing technique known as Elastic Averaging SGD (EASGD), and is similar to another technique called Gossiping SGD which also uses Gossip-like protocols. Elastic Gossip is empirically evaluated against Gossiping SGD using the MNIST digit recognition and CIFAR-10 classification tasks, using commonly used Neural Network architectures spanning Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). It is found that Elastic Gossip, Gossiping SGD, and All-reduce SGD perform quite comparably, even though the latter entails a substantially higher communication cost. While Elastic Gossip performs better than Gossiping SGD in these experiments, it is possible that a more thorough search over hyper-parameter space, specific to a given application, may yield configurations of Gossiping SGD that work better than Elastic Gossip.
Tasks
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02407v1
PDF	http://arxiv.org/pdf/1812.02407v1.pdf
PWC	https://paperswithcode.com/paper/elastic-gossip-distributing-neural-network
Repo	https://github.com/sidps/dist_training
Framework	pytorch