Paper Group ANR 1080
RNNs Implicitly Implement Tensor Product Representations. Learning the effect of latent variables in Gaussian Graphical models with unobserved variables. An Attentive Sequence Model for Adverse Drug Event Extraction from Biomedical Text. How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?. PANDA: Facilitating Usable AI Development. …
RNNs Implicitly Implement Tensor Product Representations
Title | RNNs Implicitly Implement Tensor Product Representations |
Authors | R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky |
Abstract | Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies). Such regularities motivate our hypothesis that RNNs that show such regularities implicitly compile symbolic structures into tensor product representations (TPRs; Smolensky, 1990), which additively combine tensor products of vectors representing roles (e.g., sequence positions) and vectors representing fillers (e.g., particular words). To test this hypothesis, we introduce Tensor Product Decomposition Networks (TPDNs), which use TPRs to approximate existing vector representations. We demonstrate using synthetic data that TPDNs can successfully approximate linear and tree-based RNN autoencoder representations, suggesting that these representations exhibit interpretable compositional structure; we explore the settings that lead RNNs to induce such structure-sensitive representations. By contrast, further TPDN experiments show that the representations of four models trained to encode naturally-occurring sentences can be largely approximated with a bag of words, with only marginal improvements from more sophisticated structures. We conclude that TPDNs provide a powerful method for interpreting vector representations, and that standard RNNs can induce compositional sequence representations that are remarkably well approximated by TPRs; at the same time, existing training tasks for sentence representation learning may not be sufficient for inducing robust structural representations. |
Tasks | Representation Learning |
Published | 2018-12-20 |
URL | http://arxiv.org/abs/1812.08718v2 |
http://arxiv.org/pdf/1812.08718v2.pdf | |
PWC | https://paperswithcode.com/paper/rnns-implicitly-implement-tensor-product-1 |
Repo | |
Framework | |
Learning the effect of latent variables in Gaussian Graphical models with unobserved variables
Title | Learning the effect of latent variables in Gaussian Graphical models with unobserved variables |
Authors | Marina Vinyes, Guillaume Obozinski |
Abstract | The edge structure of the graph defining an undirected graphical model describes precisely the structure of dependence between the variables in the graph. In many applications, the dependence structure is unknown and it is desirable to learn it from data, often because it is a preliminary step to be able to ascertain causal effects. This problem, known as structure learning, is hard in general, but for Gaussian graphical models it is slightly easier because the structure of the graph is given by the sparsity pattern of the precision matrix of the joint distribution, and because independence coincides with decorrelation. A major difficulty too often ignored in structure learning is the fact that if some variables are not observed, the marginal dependence graph over the observed variables will possibly be significantly more complex and no longer reflect the direct dependencies that are potentially associated with causal effects. In this work, we consider a family of latent variable Gaussian graphical models in which the graph of the joint distribution between observed and unobserved variables is sparse, and the unobserved variables are conditionally independent given the others. Prior work was able to recover the connectivity between observed variables, but could only identify the subspace spanned by unobserved variables, whereas we propose a convex optimization formulation based on structured matrix sparsity to estimate the complete connectivity of the complete graph including unobserved variables, given the knowledge of the number of missing variables, and a priori knowledge of their level of connectivity. Our formulation is supported by a theoretical result of identifiability of the latent dependence structure for sparse graphs in the infinite data limit. We propose an algorithm leveraging recent active set methods, which performs well in the experiments on synthetic data. |
Tasks | |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.07754v2 |
http://arxiv.org/pdf/1807.07754v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-the-effect-of-latent-variables-in |
Repo | |
Framework | |
An Attentive Sequence Model for Adverse Drug Event Extraction from Biomedical Text
Title | An Attentive Sequence Model for Adverse Drug Event Extraction from Biomedical Text |
Authors | Suriyadeepan Ramamoorthy, Selvakumar Murugan |
Abstract | Adverse reaction caused by drugs is a potentially dangerous problem which may lead to mortality and morbidity in patients. Adverse Drug Event (ADE) extraction is a significant problem in biomedical research. We model ADE extraction as a Question-Answering problem and take inspiration from Machine Reading Comprehension (MRC) literature, to design our model. Our objective in designing such a model, is to exploit the local linguistic context in clinical text and enable intra-sequence interaction, in order to jointly learn to classify drug and disease entities, and to extract adverse reactions caused by a given drug. Our model makes use of a self-attention mechanism to facilitate intra-sequence interaction in a text sequence. This enables us to visualize and understand how the network makes use of the local and wider context for classification. |
Tasks | Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2018-01-02 |
URL | http://arxiv.org/abs/1801.00625v1 |
http://arxiv.org/pdf/1801.00625v1.pdf | |
PWC | https://paperswithcode.com/paper/an-attentive-sequence-model-for-adverse-drug |
Repo | |
Framework | |
How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?
Title | How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery? |
Authors | Richard Y. Zhang, Cédric Josz, Somayeh Sojoudi, Javad Lavaei |
Abstract | When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP)—i.e. they are approximately norm-preserving—the problem is known to contain no spurious local minima, so exact recovery is guaranteed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that every x is the spurious local minimum of a rank-1 instance of matrix recovery that satisfies RIP. One specific counterexample has RIP constant $\delta=1/2$, but causes randomly initialized stochastic gradient descent (SGD) to fail 12% of the time. SGD is frequently able to avoid and escape spurious local minima, but this empirical result shows that it can occasionally be defeated by their existence. Hence, while exact recovery guarantees will likely require a proof of no spurious local minima, arguments based solely on norm preservation will only be applicable to a narrow set of nearly-isotropic instances. |
Tasks | |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10251v2 |
http://arxiv.org/pdf/1805.10251v2.pdf | |
PWC | https://paperswithcode.com/paper/how-much-restricted-isometry-is-needed-in |
Repo | |
Framework | |
PANDA: Facilitating Usable AI Development
Title | PANDA: Facilitating Usable AI Development |
Authors | Jinyang Gao, Wei Wang, Meihui Zhang, Gang Chen, H. V. Jagadish, Guoliang Li, Teck Khim Ng, Beng Chin Ooi, Sheng Wang, Jingren Zhou |
Abstract | Recent advances in artificial intelligence (AI) and machine learning have created a general perception that AI could be used to solve complex problems, and in some situations over-hyped as a tool that can be so easily used. Unfortunately, the barrier to realization of mass adoption of AI on various business domains is too high because most domain experts have no background in AI. Developing AI applications involves multiple phases, namely data preparation, application modeling, and product deployment. The effort of AI research has been spent mostly on new AI models (in the model training stage) to improve the performance of benchmark tasks such as image recognition. Many other factors such as usability, efficiency and security of AI have not been well addressed, and therefore form a barrier to democratizing AI. Further, for many real world applications such as healthcare and autonomous driving, learning via huge amounts of possibility exploration is not feasible since humans are involved. In many complex applications such as healthcare, subject matter experts (e.g. Clinicians) are the ones who appreciate the importance of features that affect health, and their knowledge together with existing knowledge bases are critical to the end results. In this paper, we take a new perspective on developing AI solutions, and present a solution for making AI usable. We hope that this resolution will enable all subject matter experts (eg. Clinicians) to exploit AI like data scientists. |
Tasks | Autonomous Driving |
Published | 2018-04-26 |
URL | http://arxiv.org/abs/1804.09997v1 |
http://arxiv.org/pdf/1804.09997v1.pdf | |
PWC | https://paperswithcode.com/paper/panda-facilitating-usable-ai-development |
Repo | |
Framework | |
Multi-Scale Spatially-Asymmetric Recalibration for Image Classification
Title | Multi-Scale Spatially-Asymmetric Recalibration for Image Classification |
Authors | Yan Wang, Lingxi Xie, Siyuan Qiao, Ya Zhang, Wenjun Zhang, Alan L. Yuille |
Abstract | Convolution is spatially-symmetric, i.e., the visual features are independent of its position in the image, which limits its ability to utilize contextual cues for visual recognition. This paper addresses this issue by introducing a recalibration process, which refers to the surrounding region of each neuron, computes an importance value and multiplies it to the original neural response. Our approach is named multi-scale spatially-asymmetric recalibration (MS-SAR), which extracts visual cues from surrounding regions at multiple scales, and designs a weighting scheme which is asymmetric in the spatial domain. MS-SAR is implemented in an efficient way, so that only small fractions of extra parameters and computations are required. We apply MS-SAR to several popular building blocks, including the residual block and the densely-connected block, and demonstrate its superior performance in both CIFAR and ILSVRC2012 classification tasks. |
Tasks | Image Classification |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.00787v1 |
http://arxiv.org/pdf/1804.00787v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-scale-spatially-asymmetric |
Repo | |
Framework | |
PaDNet: Pan-Density Crowd Counting
Title | PaDNet: Pan-Density Crowd Counting |
Authors | Yukun Tian, Yiming Lei, Junping Zhang, James Z. Wang |
Abstract | The problem of counting crowds in varying density scenes or in different density regions of the same scene, named as pan-density crowd counting, is highly challenging. Previous methods are designed for single density scenes or do not fully utilize pan-density information. We propose a novel framework, the Pan-Density Network (PaDNet), for pan-density crowd counting. In order to effectively capture pan-density information, PaDNet has a novel module, the Density-Aware Network (DAN), that contains multiple sub-networks pretrained on scenarios with different densities. Further, a module named the Feature Enhancement Layer (FEL) is proposed to aggregate the feature maps learned by DAN. It learns an enhancement rate or a weight for each feature map to boost these feature maps. Further, we propose two refined metrics, Patch MAE (PMAE) and Patch RMSE (PRMSE), for better evaluating the model performance on pan-density scenarios. Extensive experiments on four crowd counting benchmark datasets indicate that PaDNet achieves state-of-the-art recognition performance and high robustness in pan-density crowd counting. |
Tasks | Crowd Counting |
Published | 2018-11-07 |
URL | https://arxiv.org/abs/1811.02805v3 |
https://arxiv.org/pdf/1811.02805v3.pdf | |
PWC | https://paperswithcode.com/paper/padnet-pan-density-crowd-counting |
Repo | |
Framework | |
Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion
Title | Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion |
Authors | Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang |
Abstract | This paper presents a refinement framework of WaveNet vocoders for variational autoencoder (VAE) based voice conversion (VC), which reduces the quality distortion caused by the mismatch between the training data and testing data. Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch often causes significant quality and similarity degradation. In this work, we take advantage of the particular structure of VAEs to refine WaveNet vocoders with the self-reconstructed features generated by VAE, which are of similar characteristics with the converted features while having the same temporal structure with the target natural features. We analyze these features and show that the self-reconstructed features are similar to the converted features. Objective and subjective experimental results demonstrate the effectiveness of our proposed framework. |
Tasks | Voice Conversion |
Published | 2018-11-27 |
URL | https://arxiv.org/abs/1811.11078v2 |
https://arxiv.org/pdf/1811.11078v2.pdf | |
PWC | https://paperswithcode.com/paper/refined-wavenet-vocoder-for-variational |
Repo | |
Framework | |
Hierarchical Genetic Algorithms with evolving objective functions
Title | Hierarchical Genetic Algorithms with evolving objective functions |
Authors | Harshavardhan Kamarthi, Kousik Krishnan |
Abstract | We propose a framework of genetic algorithms which use multi-level hierarchies to solve an optimization problem by searching over the space of simpler objective functions. We solve a variant of Travelling Salesman Problem called \texttt{soft-TSP} and show that when the constraints on the overall objective function are changed the algorithm adapts to churn out solutions for the changed objective. We use this idea to speed up learning by systematically altering the constraints to find a more globally optimal solution. We also use this framework to solve polynomial regression where the actual objective function is unknown but searching over space of available objective functions yields a good approximate solution. |
Tasks | |
Published | 2018-12-01 |
URL | https://arxiv.org/abs/1812.10308v3 |
https://arxiv.org/pdf/1812.10308v3.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-genetic-algorithms-with-evolving |
Repo | |
Framework | |
ACM RecSys 2018 Late-Breaking Results Proceedings
Title | ACM RecSys 2018 Late-Breaking Results Proceedings |
Authors | Christoph Trattner, Vanessa Murdock, Steven Chang |
Abstract | The ACM RecSys’18 Late-Breaking Results track (previously known as the Poster track) is part of the main program of the 2018 ACM Conference on Recommender Systems in Vancouver, Canada. The track attracted 48 submissions this year out of which 18 papers could be accepted resulting in an acceptance rated of 37.5%. |
Tasks | Recommendation Systems |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.04106v1 |
http://arxiv.org/pdf/1809.04106v1.pdf | |
PWC | https://paperswithcode.com/paper/acm-recsys-2018-late-breaking-results |
Repo | |
Framework | |
Structured Analysis Dictionary Learning for Image Classification
Title | Structured Analysis Dictionary Learning for Image Classification |
Authors | Wen Tang, Ashkan Panahi, Hamid Krim, Liyi Dai |
Abstract | We propose a computationally efficient and high-performance classification algorithm by incorporating class structural information in analysis dictionary learning. To achieve more consistent classification, we associate a class characteristic structure of independent subspaces and impose it on the classification error constrained analysis dictionary learning. Experiments demonstrate that our method achieves a comparable or better performance than the state-of-the-art algorithms in a variety of visual classification tasks. In addition, our method greatly reduces the training and testing computational complexity. |
Tasks | Dictionary Learning, Image Classification |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00597v1 |
http://arxiv.org/pdf/1805.00597v1.pdf | |
PWC | https://paperswithcode.com/paper/structured-analysis-dictionary-learning-for |
Repo | |
Framework | |
Relational Forward Models for Multi-Agent Learning
Title | Relational Forward Models for Multi-Agent Learning |
Authors | Andrea Tacchetti, H. Francis Song, Pedro A. M. Mediano, Vinicius Zambaldi, Neil C. Rabinowitz, Thore Graepel, Matthew Botvinick, Peter W. Battaglia |
Abstract | The behavioral dynamics of multi-agent systems have a rich and orderly structure, which can be leveraged to understand these systems, and to improve how artificial agents learn to operate in them. Here we introduce Relational Forward Models (RFM) for multi-agent learning, networks that can learn to make accurate predictions of agents’ future behavior in multi-agent environments. Because these models operate on the discrete entities and relations present in the environment, they produce interpretable intermediate representations which offer insights into what drives agents’ behavior, and what events mediate the intensity and valence of social interactions. Furthermore, we show that embedding RFM modules inside agents results in faster learning systems compared to non-augmented baselines. As more and more of the autonomous systems we develop and interact with become multi-agent in nature, developing richer analysis tools for characterizing how and why agents make decisions is increasingly necessary. Moreover, developing artificial agents that quickly and safely learn to coordinate with one another, and with humans in shared environments, is crucial. |
Tasks | |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.11044v1 |
http://arxiv.org/pdf/1809.11044v1.pdf | |
PWC | https://paperswithcode.com/paper/relational-forward-models-for-multi-agent |
Repo | |
Framework | |
Quantum-inspired classical algorithms for principal component analysis and supervised clustering
Title | Quantum-inspired classical algorithms for principal component analysis and supervised clustering |
Authors | Ewin Tang |
Abstract | We describe classical analogues to Lloyd et al.‘s quantum algorithms for principal component analysis and nearest-centroid clustering. We introduce a classical algorithm model that assumes we can efficiently perform $\ell^2$-norm samples of input data, a natural analogue to quantum algorithms assuming efficient state preparation. In this model, our classical algorithms run in time polylogarithmic in input size, matching the runtime of the quantum algorithms with only polynomial slowdown. These algorithms indicate that their corresponding problems do not yield exponential quantum speedups. |
Tasks | Recommendation Systems |
Published | 2018-10-31 |
URL | https://arxiv.org/abs/1811.00414v2 |
https://arxiv.org/pdf/1811.00414v2.pdf | |
PWC | https://paperswithcode.com/paper/quantum-inspired-classical-algorithms-for |
Repo | |
Framework | |
Large scale distributed neural network training through online distillation
Title | Large scale distributed neural network training through online distillation |
Authors | Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton |
Abstract | Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a variant of distillation which is relatively straightforward to use as it does not require a complicated multi-stage setup or many new hyperparameters. Our first claim is that online distillation enables us to use extra parallelism to fit very large datasets about twice as fast. Crucially, we can still speed up training even after we have already reached the point at which additional parallelism provides no benefit for synchronous or asynchronous stochastic gradient descent. Two neural networks trained on disjoint subsets of the data can share knowledge by encouraging each model to agree with the predictions the other model would have made. These predictions can come from a stale version of the other model so they can be safely computed using weights that only rarely get transmitted. Our second claim is that online distillation is a cost-effective way to make the exact predictions of a model dramatically more reproducible. We support our claims using experiments on the Criteo Display Ad Challenge dataset, ImageNet, and the largest to-date dataset used for neural language modeling, containing $6\times 10^{11}$ tokens and based on the Common Crawl repository of web data. |
Tasks | Language Modelling |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.03235v1 |
http://arxiv.org/pdf/1804.03235v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-distributed-neural-network |
Repo | |
Framework | |
Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning
Title | Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning |
Authors | Vasili Ramanishka, Yi-Ting Chen, Teruhisa Misu, Kate Saenko |
Abstract | Driving Scene understanding is a key ingredient for intelligent transportation systems. To achieve systems that can operate in a complex physical and social environment, they need to understand and learn how humans drive and interact with traffic scenes. We present the Honda Research Institute Driving Dataset (HDD), a challenging dataset to enable research on learning driver behavior in real-life environments. The dataset includes 104 hours of real human driving in the San Francisco Bay Area collected using an instrumented vehicle equipped with different sensors. We provide a detailed analysis of HDD with a comparison to other driving datasets. A novel annotation methodology is introduced to enable research on driver behavior understanding from untrimmed data sequences. As the first step, baseline algorithms for driver behavior detection are trained and tested to demonstrate the feasibility of the proposed task. |
Tasks | Scene Understanding |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02307v1 |
http://arxiv.org/pdf/1811.02307v1.pdf | |
PWC | https://paperswithcode.com/paper/toward-driving-scene-understanding-a-dataset |
Repo | |
Framework | |