October 16, 2019

2986 words 15 mins read

Paper Group ANR 1080

RNNs Implicitly Implement Tensor Product Representations. Learning the effect of latent variables in Gaussian Graphical models with unobserved variables. An Attentive Sequence Model for Adverse Drug Event Extraction from Biomedical Text. How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?. PANDA: Facilitating Usable AI Development. …

RNNs Implicitly Implement Tensor Product Representations


Title	RNNs Implicitly Implement Tensor Product Representations
Authors	R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky
Abstract	Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies). Such regularities motivate our hypothesis that RNNs that show such regularities implicitly compile symbolic structures into tensor product representations (TPRs; Smolensky, 1990), which additively combine tensor products of vectors representing roles (e.g., sequence positions) and vectors representing fillers (e.g., particular words). To test this hypothesis, we introduce Tensor Product Decomposition Networks (TPDNs), which use TPRs to approximate existing vector representations. We demonstrate using synthetic data that TPDNs can successfully approximate linear and tree-based RNN autoencoder representations, suggesting that these representations exhibit interpretable compositional structure; we explore the settings that lead RNNs to induce such structure-sensitive representations. By contrast, further TPDN experiments show that the representations of four models trained to encode naturally-occurring sentences can be largely approximated with a bag of words, with only marginal improvements from more sophisticated structures. We conclude that TPDNs provide a powerful method for interpreting vector representations, and that standard RNNs can induce compositional sequence representations that are remarkably well approximated by TPRs; at the same time, existing training tasks for sentence representation learning may not be sufficient for inducing robust structural representations.
Tasks	Representation Learning
Published	2018-12-20
URL	http://arxiv.org/abs/1812.08718v2
PDF	http://arxiv.org/pdf/1812.08718v2.pdf
PWC	https://paperswithcode.com/paper/rnns-implicitly-implement-tensor-product-1
Repo
Framework

Learning the effect of latent variables in Gaussian Graphical models with unobserved variables


Title	Learning the effect of latent variables in Gaussian Graphical models with unobserved variables
Authors	Marina Vinyes, Guillaume Obozinski
Abstract	The edge structure of the graph defining an undirected graphical model describes precisely the structure of dependence between the variables in the graph. In many applications, the dependence structure is unknown and it is desirable to learn it from data, often because it is a preliminary step to be able to ascertain causal effects. This problem, known as structure learning, is hard in general, but for Gaussian graphical models it is slightly easier because the structure of the graph is given by the sparsity pattern of the precision matrix of the joint distribution, and because independence coincides with decorrelation. A major difficulty too often ignored in structure learning is the fact that if some variables are not observed, the marginal dependence graph over the observed variables will possibly be significantly more complex and no longer reflect the direct dependencies that are potentially associated with causal effects. In this work, we consider a family of latent variable Gaussian graphical models in which the graph of the joint distribution between observed and unobserved variables is sparse, and the unobserved variables are conditionally independent given the others. Prior work was able to recover the connectivity between observed variables, but could only identify the subspace spanned by unobserved variables, whereas we propose a convex optimization formulation based on structured matrix sparsity to estimate the complete connectivity of the complete graph including unobserved variables, given the knowledge of the number of missing variables, and a priori knowledge of their level of connectivity. Our formulation is supported by a theoretical result of identifiability of the latent dependence structure for sparse graphs in the infinite data limit. We propose an algorithm leveraging recent active set methods, which performs well in the experiments on synthetic data.
Tasks
Published	2018-07-20
URL	http://arxiv.org/abs/1807.07754v2
PDF	http://arxiv.org/pdf/1807.07754v2.pdf
PWC	https://paperswithcode.com/paper/learning-the-effect-of-latent-variables-in
Repo
Framework

An Attentive Sequence Model for Adverse Drug Event Extraction from Biomedical Text


Title	An Attentive Sequence Model for Adverse Drug Event Extraction from Biomedical Text
Authors	Suriyadeepan Ramamoorthy, Selvakumar Murugan
Abstract	Adverse reaction caused by drugs is a potentially dangerous problem which may lead to mortality and morbidity in patients. Adverse Drug Event (ADE) extraction is a significant problem in biomedical research. We model ADE extraction as a Question-Answering problem and take inspiration from Machine Reading Comprehension (MRC) literature, to design our model. Our objective in designing such a model, is to exploit the local linguistic context in clinical text and enable intra-sequence interaction, in order to jointly learn to classify drug and disease entities, and to extract adverse reactions caused by a given drug. Our model makes use of a self-attention mechanism to facilitate intra-sequence interaction in a text sequence. This enables us to visualize and understand how the network makes use of the local and wider context for classification.
Tasks	Machine Reading Comprehension, Question Answering, Reading Comprehension
Published	2018-01-02
URL	http://arxiv.org/abs/1801.00625v1
PDF	http://arxiv.org/pdf/1801.00625v1.pdf
PWC	https://paperswithcode.com/paper/an-attentive-sequence-model-for-adverse-drug
Repo
Framework

How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?


Title	How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?
Authors	Richard Y. Zhang, Cédric Josz, Somayeh Sojoudi, Javad Lavaei
Abstract	When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP)—i.e. they are approximately norm-preserving—the problem is known to contain no spurious local minima, so exact recovery is guaranteed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that every x is the spurious local minimum of a rank-1 instance of matrix recovery that satisfies RIP. One specific counterexample has RIP constant $\delta=1/2$, but causes randomly initialized stochastic gradient descent (SGD) to fail 12% of the time. SGD is frequently able to avoid and escape spurious local minima, but this empirical result shows that it can occasionally be defeated by their existence. Hence, while exact recovery guarantees will likely require a proof of no spurious local minima, arguments based solely on norm preservation will only be applicable to a narrow set of nearly-isotropic instances.
Tasks
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10251v2
PDF	http://arxiv.org/pdf/1805.10251v2.pdf
PWC	https://paperswithcode.com/paper/how-much-restricted-isometry-is-needed-in
Repo
Framework

PANDA: Facilitating Usable AI Development


Title	PANDA: Facilitating Usable AI Development
Authors	Jinyang Gao, Wei Wang, Meihui Zhang, Gang Chen, H. V. Jagadish, Guoliang Li, Teck Khim Ng, Beng Chin Ooi, Sheng Wang, Jingren Zhou
Abstract	Recent advances in artificial intelligence (AI) and machine learning have created a general perception that AI could be used to solve complex problems, and in some situations over-hyped as a tool that can be so easily used. Unfortunately, the barrier to realization of mass adoption of AI on various business domains is too high because most domain experts have no background in AI. Developing AI applications involves multiple phases, namely data preparation, application modeling, and product deployment. The effort of AI research has been spent mostly on new AI models (in the model training stage) to improve the performance of benchmark tasks such as image recognition. Many other factors such as usability, efficiency and security of AI have not been well addressed, and therefore form a barrier to democratizing AI. Further, for many real world applications such as healthcare and autonomous driving, learning via huge amounts of possibility exploration is not feasible since humans are involved. In many complex applications such as healthcare, subject matter experts (e.g. Clinicians) are the ones who appreciate the importance of features that affect health, and their knowledge together with existing knowledge bases are critical to the end results. In this paper, we take a new perspective on developing AI solutions, and present a solution for making AI usable. We hope that this resolution will enable all subject matter experts (eg. Clinicians) to exploit AI like data scientists.
Tasks	Autonomous Driving
Published	2018-04-26
URL	http://arxiv.org/abs/1804.09997v1
PDF	http://arxiv.org/pdf/1804.09997v1.pdf
PWC	https://paperswithcode.com/paper/panda-facilitating-usable-ai-development
Repo
Framework

Multi-Scale Spatially-Asymmetric Recalibration for Image Classification


Title	Multi-Scale Spatially-Asymmetric Recalibration for Image Classification
Authors	Yan Wang, Lingxi Xie, Siyuan Qiao, Ya Zhang, Wenjun Zhang, Alan L. Yuille
Abstract	Convolution is spatially-symmetric, i.e., the visual features are independent of its position in the image, which limits its ability to utilize contextual cues for visual recognition. This paper addresses this issue by introducing a recalibration process, which refers to the surrounding region of each neuron, computes an importance value and multiplies it to the original neural response. Our approach is named multi-scale spatially-asymmetric recalibration (MS-SAR), which extracts visual cues from surrounding regions at multiple scales, and designs a weighting scheme which is asymmetric in the spatial domain. MS-SAR is implemented in an efficient way, so that only small fractions of extra parameters and computations are required. We apply MS-SAR to several popular building blocks, including the residual block and the densely-connected block, and demonstrate its superior performance in both CIFAR and ILSVRC2012 classification tasks.
Tasks	Image Classification
Published	2018-04-03
URL	http://arxiv.org/abs/1804.00787v1
PDF	http://arxiv.org/pdf/1804.00787v1.pdf
PWC	https://paperswithcode.com/paper/multi-scale-spatially-asymmetric
Repo
Framework

PaDNet: Pan-Density Crowd Counting


Title	PaDNet: Pan-Density Crowd Counting
Authors	Yukun Tian, Yiming Lei, Junping Zhang, James Z. Wang
Abstract	The problem of counting crowds in varying density scenes or in different density regions of the same scene, named as pan-density crowd counting, is highly challenging. Previous methods are designed for single density scenes or do not fully utilize pan-density information. We propose a novel framework, the Pan-Density Network (PaDNet), for pan-density crowd counting. In order to effectively capture pan-density information, PaDNet has a novel module, the Density-Aware Network (DAN), that contains multiple sub-networks pretrained on scenarios with different densities. Further, a module named the Feature Enhancement Layer (FEL) is proposed to aggregate the feature maps learned by DAN. It learns an enhancement rate or a weight for each feature map to boost these feature maps. Further, we propose two refined metrics, Patch MAE (PMAE) and Patch RMSE (PRMSE), for better evaluating the model performance on pan-density scenarios. Extensive experiments on four crowd counting benchmark datasets indicate that PaDNet achieves state-of-the-art recognition performance and high robustness in pan-density crowd counting.
Tasks	Crowd Counting
Published	2018-11-07
URL	https://arxiv.org/abs/1811.02805v3
PDF	https://arxiv.org/pdf/1811.02805v3.pdf
PWC	https://paperswithcode.com/paper/padnet-pan-density-crowd-counting
Repo
Framework

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion


Title	Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion
Authors	Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang
Abstract	This paper presents a refinement framework of WaveNet vocoders for variational autoencoder (VAE) based voice conversion (VC), which reduces the quality distortion caused by the mismatch between the training data and testing data. Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch often causes significant quality and similarity degradation. In this work, we take advantage of the particular structure of VAEs to refine WaveNet vocoders with the self-reconstructed features generated by VAE, which are of similar characteristics with the converted features while having the same temporal structure with the target natural features. We analyze these features and show that the self-reconstructed features are similar to the converted features. Objective and subjective experimental results demonstrate the effectiveness of our proposed framework.
Tasks	Voice Conversion
Published	2018-11-27
URL	https://arxiv.org/abs/1811.11078v2
PDF	https://arxiv.org/pdf/1811.11078v2.pdf
PWC	https://paperswithcode.com/paper/refined-wavenet-vocoder-for-variational
Repo
Framework

Hierarchical Genetic Algorithms with evolving objective functions


Title	Hierarchical Genetic Algorithms with evolving objective functions
Authors	Harshavardhan Kamarthi, Kousik Krishnan
Abstract	We propose a framework of genetic algorithms which use multi-level hierarchies to solve an optimization problem by searching over the space of simpler objective functions. We solve a variant of Travelling Salesman Problem called \texttt{soft-TSP} and show that when the constraints on the overall objective function are changed the algorithm adapts to churn out solutions for the changed objective. We use this idea to speed up learning by systematically altering the constraints to find a more globally optimal solution. We also use this framework to solve polynomial regression where the actual objective function is unknown but searching over space of available objective functions yields a good approximate solution.
Tasks
Published	2018-12-01
URL	https://arxiv.org/abs/1812.10308v3
PDF	https://arxiv.org/pdf/1812.10308v3.pdf
PWC	https://paperswithcode.com/paper/hierarchical-genetic-algorithms-with-evolving
Repo
Framework

ACM RecSys 2018 Late-Breaking Results Proceedings


Title	ACM RecSys 2018 Late-Breaking Results Proceedings
Authors	Christoph Trattner, Vanessa Murdock, Steven Chang
Abstract	The ACM RecSys’18 Late-Breaking Results track (previously known as the Poster track) is part of the main program of the 2018 ACM Conference on Recommender Systems in Vancouver, Canada. The track attracted 48 submissions this year out of which 18 papers could be accepted resulting in an acceptance rated of 37.5%.
Tasks	Recommendation Systems
Published	2018-09-11
URL	http://arxiv.org/abs/1809.04106v1
PDF	http://arxiv.org/pdf/1809.04106v1.pdf
PWC	https://paperswithcode.com/paper/acm-recsys-2018-late-breaking-results
Repo
Framework

Structured Analysis Dictionary Learning for Image Classification


Title	Structured Analysis Dictionary Learning for Image Classification
Authors	Wen Tang, Ashkan Panahi, Hamid Krim, Liyi Dai
Abstract	We propose a computationally efficient and high-performance classification algorithm by incorporating class structural information in analysis dictionary learning. To achieve more consistent classification, we associate a class characteristic structure of independent subspaces and impose it on the classification error constrained analysis dictionary learning. Experiments demonstrate that our method achieves a comparable or better performance than the state-of-the-art algorithms in a variety of visual classification tasks. In addition, our method greatly reduces the training and testing computational complexity.
Tasks	Dictionary Learning, Image Classification
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00597v1
PDF	http://arxiv.org/pdf/1805.00597v1.pdf
PWC	https://paperswithcode.com/paper/structured-analysis-dictionary-learning-for
Repo
Framework

Relational Forward Models for Multi-Agent Learning


Title	Relational Forward Models for Multi-Agent Learning
Authors	Andrea Tacchetti, H. Francis Song, Pedro A. M. Mediano, Vinicius Zambaldi, Neil C. Rabinowitz, Thore Graepel, Matthew Botvinick, Peter W. Battaglia
Abstract	The behavioral dynamics of multi-agent systems have a rich and orderly structure, which can be leveraged to understand these systems, and to improve how artificial agents learn to operate in them. Here we introduce Relational Forward Models (RFM) for multi-agent learning, networks that can learn to make accurate predictions of agents’ future behavior in multi-agent environments. Because these models operate on the discrete entities and relations present in the environment, they produce interpretable intermediate representations which offer insights into what drives agents’ behavior, and what events mediate the intensity and valence of social interactions. Furthermore, we show that embedding RFM modules inside agents results in faster learning systems compared to non-augmented baselines. As more and more of the autonomous systems we develop and interact with become multi-agent in nature, developing richer analysis tools for characterizing how and why agents make decisions is increasingly necessary. Moreover, developing artificial agents that quickly and safely learn to coordinate with one another, and with humans in shared environments, is crucial.
Tasks
Published	2018-09-28
URL	http://arxiv.org/abs/1809.11044v1
PDF	http://arxiv.org/pdf/1809.11044v1.pdf
PWC	https://paperswithcode.com/paper/relational-forward-models-for-multi-agent
Repo
Framework

Quantum-inspired classical algorithms for principal component analysis and supervised clustering


Title	Quantum-inspired classical algorithms for principal component analysis and supervised clustering
Authors	Ewin Tang
Abstract	We describe classical analogues to Lloyd et al.‘s quantum algorithms for principal component analysis and nearest-centroid clustering. We introduce a classical algorithm model that assumes we can efficiently perform $\ell^2$-norm samples of input data, a natural analogue to quantum algorithms assuming efficient state preparation. In this model, our classical algorithms run in time polylogarithmic in input size, matching the runtime of the quantum algorithms with only polynomial slowdown. These algorithms indicate that their corresponding problems do not yield exponential quantum speedups.
Tasks	Recommendation Systems
Published	2018-10-31
URL	https://arxiv.org/abs/1811.00414v2
PDF	https://arxiv.org/pdf/1811.00414v2.pdf
PWC	https://paperswithcode.com/paper/quantum-inspired-classical-algorithms-for
Repo
Framework

Large scale distributed neural network training through online distillation


Title	Large scale distributed neural network training through online distillation
Authors	Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton
Abstract	Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a variant of distillation which is relatively straightforward to use as it does not require a complicated multi-stage setup or many new hyperparameters. Our first claim is that online distillation enables us to use extra parallelism to fit very large datasets about twice as fast. Crucially, we can still speed up training even after we have already reached the point at which additional parallelism provides no benefit for synchronous or asynchronous stochastic gradient descent. Two neural networks trained on disjoint subsets of the data can share knowledge by encouraging each model to agree with the predictions the other model would have made. These predictions can come from a stale version of the other model so they can be safely computed using weights that only rarely get transmitted. Our second claim is that online distillation is a cost-effective way to make the exact predictions of a model dramatically more reproducible. We support our claims using experiments on the Criteo Display Ad Challenge dataset, ImageNet, and the largest to-date dataset used for neural language modeling, containing $6\times 10^{11}$ tokens and based on the Common Crawl repository of web data.
Tasks	Language Modelling
Published	2018-04-09
URL	http://arxiv.org/abs/1804.03235v1
PDF	http://arxiv.org/pdf/1804.03235v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-distributed-neural-network
Repo
Framework

Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning


Title	Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning
Authors	Vasili Ramanishka, Yi-Ting Chen, Teruhisa Misu, Kate Saenko
Abstract	Driving Scene understanding is a key ingredient for intelligent transportation systems. To achieve systems that can operate in a complex physical and social environment, they need to understand and learn how humans drive and interact with traffic scenes. We present the Honda Research Institute Driving Dataset (HDD), a challenging dataset to enable research on learning driver behavior in real-life environments. The dataset includes 104 hours of real human driving in the San Francisco Bay Area collected using an instrumented vehicle equipped with different sensors. We provide a detailed analysis of HDD with a comparison to other driving datasets. A novel annotation methodology is introduced to enable research on driver behavior understanding from untrimmed data sequences. As the first step, baseline algorithms for driver behavior detection are trained and tested to demonstrate the feasibility of the proposed task.
Tasks	Scene Understanding
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02307v1
PDF	http://arxiv.org/pdf/1811.02307v1.pdf
PWC	https://paperswithcode.com/paper/toward-driving-scene-understanding-a-dataset
Repo
Framework