Paper Group ANR 837
Semi-Implicit Stochastic Recurrent Neural Networks. Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues. Predict Globally, Correct Locally: Parallel-in-Time Optimal Control of Neural Networks. Large-scale Multi-modal Person Identification in Real Unconstrained Environments. Learning Conserved Networks from Flows. Transfer learning fo …
Semi-Implicit Stochastic Recurrent Neural Networks
Title | Semi-Implicit Stochastic Recurrent Neural Networks |
Authors | Ehsan Hajiramezanali, Arman Hasanzadeh, Nick Duffield, Krishna Narayanan, Mingyuan Zhou, Xiaoning Qian |
Abstract | Stochastic recurrent neural networks with latent random variables of complex dependency structures have shown to be more successful in modeling sequential data than deterministic deep models. However, the majority of existing methods have limited expressive power due to the Gaussian assumption of latent variables. In this paper, we advocate learning implicit latent representations using semi-implicit variational inference to further increase model flexibility. Semi-implicit stochastic recurrent neural network(SIS-RNN) is developed to enrich inferred model posteriors that may have no analytic density functions, as long as independent random samples can be generated via reparameterization. Extensive experiments in different tasks on real-world datasets show that SIS-RNN outperforms the existing methods. |
Tasks | |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12819v1 |
https://arxiv.org/pdf/1910.12819v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-implicit-stochastic-recurrent-neural |
Repo | |
Framework | |
Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues
Title | Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues |
Authors | Shachi Paul, Rahul Goel, Dilek Hakkani-Tür |
Abstract | Machine learning approaches for building task-oriented dialogue systems require large conversational datasets with labels to train on. We are interested in building task-oriented dialogue systems from human-human conversations, which may be available in ample amounts in existing customer care center logs or can be collected from crowd workers. Annotating these datasets can be prohibitively expensive. Recently multiple annotated task-oriented human-machine dialogue datasets have been released, however their annotation schema varies across different collections, even for well-defined categories such as dialogue acts (DAs). We propose a Universal DA schema for task-oriented dialogues and align existing annotated datasets with our schema. Our aim is to train a Universal DA tagger (U-DAT) for task-oriented dialogues and use it for tagging human-human conversations. We investigate multiple datasets, propose manual and automated approaches for aligning the different schema, and present results on a target corpus of human-human dialogues. In unsupervised learning experiments we achieve an F1 score of 54.1% on system turns in human-human dialogues. In a semi-supervised setup, the F1 score increases to 57.7% which would otherwise require at least 1.7K manually annotated turns. For new domains, we show further improvements when unlabeled or labeled target domain data is available. |
Tasks | Task-Oriented Dialogue Systems |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.03020v1 |
https://arxiv.org/pdf/1907.03020v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-universal-dialogue-act-tagging-for |
Repo | |
Framework | |
Predict Globally, Correct Locally: Parallel-in-Time Optimal Control of Neural Networks
Title | Predict Globally, Correct Locally: Parallel-in-Time Optimal Control of Neural Networks |
Authors | Panos Parpas, Corey Muir |
Abstract | The links between optimal control of dynamical systems and neural networks have proved beneficial both from a theoretical and from a practical point of view. Several researchers have exploited these links to investigate the stability of different neural network architectures and develop memory efficient training algorithms. We also adopt the dynamical systems view of neural networks, but our aim is different from earlier works. We exploit the links between dynamical systems, optimal control, and neural networks to develop a novel distributed optimization algorithm. The proposed algorithm addresses the most significant obstacle for distributed algorithms for neural network optimization: the network weights cannot be updated until the forward propagation of the data, and backward propagation of the gradients are complete. Using the dynamical systems point of view, we interpret the layers of a (residual) neural network as the discretized dynamics of a dynamical system and exploit the relationship between the co-states (adjoints) of the optimal control problem and backpropagation. We then develop a parallel-in-time method that updates the parameters of the network without waiting for the forward or back propagation algorithms to complete in full. We establish the convergence of the proposed algorithm. Preliminary numerical results suggest that the algorithm is competitive and more efficient than the state-of-the-art. |
Tasks | Distributed Optimization |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.02542v1 |
http://arxiv.org/pdf/1902.02542v1.pdf | |
PWC | https://paperswithcode.com/paper/predict-globally-correct-locally-parallel-in |
Repo | |
Framework | |
Large-scale Multi-modal Person Identification in Real Unconstrained Environments
Title | Large-scale Multi-modal Person Identification in Real Unconstrained Environments |
Authors | Jiajie Ye, Yisheng Guan, Junfa Liu, Xinghong Huang, Hong Zhang |
Abstract | Person identification (P-ID) under real unconstrained noisy environments is a huge challenge. In multiple-feature learning with Deep Convolutional Neural Networks (DCNNs) or Machine Learning method for large-scale person identification in the wild, the key is to design an appropriate strategy for decision layer fusion or feature layer fusion which can enhance discriminative power. It is necessary to extract different types of valid features and establish a reasonable framework to fuse different types of information. In traditional methods, different persons are identified based on single modal features to identify, such as face feature, audio feature, and head feature. These traditional methods cannot realize a highly accurate level of person identification in real unconstrained environments. The study aims to propose a fusion module to fuse multi-modal features for person identification in real unconstrained environments. |
Tasks | Multi-Modal Person Identification, Person Identification |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.12134v1 |
https://arxiv.org/pdf/1912.12134v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-multi-modal-person-identification |
Repo | |
Framework | |
Learning Conserved Networks from Flows
Title | Learning Conserved Networks from Flows |
Authors | Satya Jayadev P., Shankar Narasimhan, Nirav Bhatt |
Abstract | The network reconstruction problem is one of the challenging problems in network science. This work deals with reconstructing networks in which the flows are conserved around the nodes. These networks are referred to as conserved networks. We propose a novel concept of conservation graph for describing conserved networks. The properties of conservation graph are investigated. We develop a methodology to reconstruct conserved networks from flows by combining these graph properties with learning techniques, with polynomial time complexity. We show that exact network reconstruction is possible for radial networks. Further, we extend the methodology for reconstructing networks from noisy data. We demonstrate the proposed methods on different types of radial networks. |
Tasks | |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.08716v1 |
https://arxiv.org/pdf/1905.08716v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-conserved-networks-from-flows |
Repo | |
Framework | |
Transfer learning for Remaining Useful Life Prediction Based on Consensus Self-Organizing Models
Title | Transfer learning for Remaining Useful Life Prediction Based on Consensus Self-Organizing Models |
Authors | Yuantao Fan, Sławomir Nowaczyk, Thorsteinn Rögnvaldsson |
Abstract | The traditional paradigm for developing machine prognostics usually relies on generalization from data acquired in experiments under controlled conditions prior to deployment of the equipment. Detecting or predicting failures and estimating machine health in this way assumes that future field data will have a very similar distribution to the experiment data. However, many complex machines operate under dynamic environmental conditions and are used in many different ways. This makes collecting comprehensive data very challenging, and the assumption that pre-deployment data and post-deployment data follow very similar distributions is unlikely to hold. Transfer Learning (TL) refers to methods for transferring knowledge learned in one setting (the source domain) to another setting (the target domain). In this work, we present a TL method for predicting Remaining Useful Life (RUL) of equipment, under the assumption that labels are available only for the source domain and not the target domain. This setting corresponds to generalizing from a limited number of run-to-failure experiments performed prior to deployment into making prognostics with data coming from deployed equipment that is being used under multiple new operating conditions and experiencing previously unseen faults. We employ a deviation detection method, Consensus Self-Organizing Models (COSMO), to create transferable features for building the RUL regression model. These features capture how different target equipment is in comparison to its peers. The efficiency of the proposed TL method is demonstrated using the NASA Turbofan Engine Degradation Simulation Data Set. Models using the COSMO transferable features show better performance than other methods on predicting RUL when the target domain is more complex than the source domain. |
Tasks | Transfer Learning |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07053v3 |
https://arxiv.org/pdf/1909.07053v3.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-for-remaining-useful-life |
Repo | |
Framework | |
Adaptive probabilistic principal component analysis
Title | Adaptive probabilistic principal component analysis |
Authors | Adam Farooq, Yordan P. Raykov, Luc Evers, Max A. Little |
Abstract | Using the linear Gaussian latent variable model as a starting point we relax some of the constraints it imposes by deriving a nonparametric latent feature Gaussian variable model. This model introduces additional discrete latent variables to the original structure. The Bayesian nonparametric nature of this new model allows it to adapt complexity as more data is observed and project each data point onto a varying number of subspaces. The linear relationship between the continuous latent and observed variables make the proposed model straightforward to interpret, resembling a locally adaptive probabilistic PCA (A-PPCA). We propose two alternative Gibbs sampling procedures for inference in the new model and demonstrate its applicability on sensor data for passive health monitoring. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11010v1 |
https://arxiv.org/pdf/1905.11010v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-probabilistic-principal-component |
Repo | |
Framework | |
Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations
Title | Cracking the Contextual Commonsense Code: Understanding Commonsense Reasoning Aptitude of Deep Contextual Representations |
Authors | Jeff Da, Jungo Kasai |
Abstract | Pretrained deep contextual representations have advanced the state-of-the-art on various commonsense NLP tasks, but we lack a concrete understanding of the capability of these models. Thus, we investigate and challenge several aspects of BERT’s commonsense representation abilities. First, we probe BERT’s ability to classify various object attributes, demonstrating that BERT shows a strong ability in encoding various commonsense features in its embedding space, but is still deficient in many areas. Next, we show that, by augmenting BERT’s pretraining data with additional data related to the deficient attributes, we are able to improve performance on a downstream commonsense reasoning task while using a minimal amount of data. Finally, we develop a method of fine-tuning knowledge graphs embeddings alongside BERT and show the continued importance of explicit knowledge graphs. |
Tasks | Knowledge Graphs |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.01157v2 |
https://arxiv.org/pdf/1910.01157v2.pdf | |
PWC | https://paperswithcode.com/paper/cracking-the-contextual-commonsense-code |
Repo | |
Framework | |
Learning to Infer Implicit Surfaces without 3D Supervision
Title | Learning to Infer Implicit Surfaces without 3D Supervision |
Authors | Shichen Liu, Shunsuke Saito, Weikai Chen, Hao Li |
Abstract | Recent advances in 3D deep learning have shown that it is possible to train highly effective deep models for 3D shape generation, directly from 2D images. This is particularly interesting since the availability of 3D models is still limited compared to the massive amount of accessible 2D images, which is invaluable for training. The representation of 3D surfaces itself is a key factor for the quality and resolution of the 3D output. While explicit representations, such as point clouds and voxels, can span a wide range of shape variations, their resolutions are often limited. Mesh-based representations are more efficient but are limited by their ability to handle varying topologies. Implicit surfaces, however, can robustly handle complex shapes, topologies, and also provide flexible resolution control. We address the fundamental problem of learning implicit surfaces for shape inference without the need of 3D supervision. Despite their advantages, it remains nontrivial to (1) formulate a differentiable connection between implicit surfaces and their 2D renderings, which is needed for image-based supervision; and (2) ensure precise geometric properties and control, such as local smoothness. In particular, sampling implicit surfaces densely is also known to be a computationally demanding and very slow operation. To this end, we propose a novel ray-based field probing technique for efficient image-to-field supervision, as well as a general geometric regularizer for implicit surfaces, which provides natural shape priors in unconstrained regions. We demonstrate the effectiveness of our framework on the task of single-view image-based 3D shape digitization and show how we outperform state-of-the-art techniques both quantitatively and qualitatively. |
Tasks | 3D Shape Generation |
Published | 2019-11-02 |
URL | https://arxiv.org/abs/1911.00767v1 |
https://arxiv.org/pdf/1911.00767v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-infer-implicit-surfaces-without |
Repo | |
Framework | |
Semantic Change and Semantic Stability: Variation is Key
Title | Semantic Change and Semantic Stability: Variation is Key |
Authors | Claire Bowern |
Abstract | I survey some recent approaches to studying change in the lexicon, particularly change in meaning across phylogenies. I briefly sketch an evolutionary approach to language change and point out some issues in recent approaches to studying semantic change that rely on temporally stratified word embeddings. I draw illustrations from lexical cognate models in Pama-Nyungan to identify meaning classes most appropriate for lexical phylogenetic inference, particularly highlighting the importance of variation in studying change over time. |
Tasks | Word Embeddings |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05760v1 |
https://arxiv.org/pdf/1906.05760v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-change-and-semantic-stability |
Repo | |
Framework | |
On Value Discrepancy of Imitation Learning
Title | On Value Discrepancy of Imitation Learning |
Authors | Tian Xu, Ziniu Li, Yang Yu |
Abstract | Imitation learning trains a policy from expert demonstrations. Imitation learning approaches have been designed from various principles, such as behavioral cloning via supervised learning, apprenticeship learning via inverse reinforcement learning, and GAIL via generative adversarial learning. In this paper, we propose a framework to analyze the theoretical property of imitation learning approaches based on discrepancy propagation analysis. Under the infinite-horizon setting, the framework leads to the value discrepancy of behavioral cloning in an order of O((1-\gamma)^{-2}). We also show that the framework leads to the value discrepancy of GAIL in an order of O((1-\gamma)^{-1}). It implies that GAIL has less compounding errors than behavioral cloning, which is also verified empirically in this paper. To the best of our knowledge, we are the first one to analyze GAIL’s performance theoretically. The above results indicate that the proposed framework is a general tool to analyze imitation learning approaches. We hope our theoretical results can provide insights for future improvements in imitation learning algorithms. |
Tasks | Imitation Learning |
Published | 2019-11-16 |
URL | https://arxiv.org/abs/1911.07027v1 |
https://arxiv.org/pdf/1911.07027v1.pdf | |
PWC | https://paperswithcode.com/paper/on-value-discrepancy-of-imitation-learning |
Repo | |
Framework | |
Epistasis-based Basis Estimation Method for Simplifying the Problem Space of an Evolutionary Search in Binary Representation
Title | Epistasis-based Basis Estimation Method for Simplifying the Problem Space of an Evolutionary Search in Binary Representation |
Authors | Junghwan Lee, Yong-Hyuk Kim |
Abstract | An evolutionary search space can be smoothly transformed via a suitable change of basis; however, it can be difficult to determine an appropriate basis. In this paper, a method is proposed to select an optimum basis can be used to simplify an evolutionary search space in a binary encoding scheme. The basis search method is based on a genetic algorithm and the fitness evaluation is based on the epistasis, which is an indicator of the complexity of a genetic algorithm. Two tests were conducted to validate the proposed method when applied to two different evolutionary search problems. The first searched for an appropriate basis to apply, while the second searched for a solution to the test problem. The results obtained after the identified basis had been applied were compared to those with the original basis, and it was found that the proposed method provided superior results. |
Tasks | |
Published | 2019-04-19 |
URL | http://arxiv.org/abs/1904.09103v1 |
http://arxiv.org/pdf/1904.09103v1.pdf | |
PWC | https://paperswithcode.com/paper/epistasis-based-basis-estimation-method-for |
Repo | |
Framework | |
Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs
Title | Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs |
Authors | Yu Bai, John Duchi, Song Mei |
Abstract | We study a family of (potentially non-convex) constrained optimization problems with convex composite structure. Through a novel analysis of non-smooth geometry, we show that proximal-type algorithms applied to exact penalty formulations of such problems exhibit local linear convergence under a quadratic growth condition, which the compositional structure we consider ensures. The main application of our results is to low-rank semidefinite optimization with Burer-Monteiro factorizations. We precisely identify the conditions for quadratic growth in the factorized problem via structures in the semidefinite problem, which could be of independent interest for understanding matrix factorization. |
Tasks | |
Published | 2019-03-01 |
URL | http://arxiv.org/abs/1903.00184v1 |
http://arxiv.org/pdf/1903.00184v1.pdf | |
PWC | https://paperswithcode.com/paper/proximal-algorithms-for-constrained-composite |
Repo | |
Framework | |
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering
Title | Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering |
Authors | Kaixin Ma, Jonathan Francis, Quanyang Lu, Eric Nyberg, Alessandro Oltramari |
Abstract | Non-extractive commonsense QA remains a challenging AI task, as it requires systems to reason about, synthesize, and gather disparate pieces of information, in order to generate responses to queries. Recent approaches on such tasks show increased performance, only when models are either pre-trained with additional information or when domain-specific heuristics are used, without any special consideration regarding the knowledge resource type. In this paper, we perform a survey of recent commonsense QA methods and we provide a systematic analysis of popular knowledge resources and knowledge-integration methods, across benchmarks from multiple commonsense datasets. Our results and analysis show that attention-based injection seems to be a preferable choice for knowledge integration and that the degree of domain overlap, between knowledge bases and datasets, plays a crucial role in determining model success. |
Tasks | Question Answering |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.14087v1 |
https://arxiv.org/pdf/1910.14087v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-generalizable-neuro-symbolic-systems |
Repo | |
Framework | |
Give me (un)certainty – An exploration of parameters that affect segmentation uncertainty
Title | Give me (un)certainty – An exploration of parameters that affect segmentation uncertainty |
Authors | Katharina Hoebel, Ken Chang, Jay Patel, Praveer Singh, Jayashree Kalpathy-Cramer |
Abstract | Segmentation tasks in medical imaging are inherently ambiguous: the boundary of a target structure is oftentimes unclear due to image quality and biological factors. As such, predicted segmentations from deep learning algorithms are inherently ambiguous. Additionally, “ground truth” segmentations performed by human annotators are in fact weak labels that further increase the uncertainty of outputs of supervised models developed on these manual labels. To date, most deep learning segmentation studies utilize predicted segmentations without uncertainty quantification. In contrast, we explore the use of Monte Carlo dropout U-Nets for the segmentation with additional quantification of segmentation uncertainty. We assess the utility of three measures of uncertainty (Coefficient of Variation, Mean Pairwise Dice, and Mean Voxelwise Uncertainty) for the segmentation of a less ambiguous target structure (liver) and a more ambiguous one (liver tumors). Furthermore, we assess how the utility of these measures changes with different patch sizes and cost functions. Our results suggest that models trained using larger patches and the weighted categorical cross-entropy as cost function allow the extraction of more meaningful uncertainty measures compared to smaller patches and soft dice loss. Among the three uncertainty measures Mean Pairwise Dice shows the strongest correlation with segmentation quality. Our study serves as a proof-of-concept of how uncertainty measures can be used to assess the quality of a predicted segmentation, potentially serving to flag low quality segmentations from a given model for further human review. |
Tasks | |
Published | 2019-11-14 |
URL | https://arxiv.org/abs/1911.06357v1 |
https://arxiv.org/pdf/1911.06357v1.pdf | |
PWC | https://paperswithcode.com/paper/give-me-uncertainty-an-exploration-of |
Repo | |
Framework | |