October 20, 2019

3261 words 16 mins read

Paper Group ANR 2

Improving Matching Models with Hierarchical Contextualized Representations for Multi-turn Response Selection. Gradient Adversarial Training of Neural Networks. Understanding the Acceleration Phenomenon via High-Resolution Differential Equations. Online Cluster Validity Indices for Streaming Data. ATM:Adversarial-neural Topic Model. Variational Baye …

Improving Matching Models with Hierarchical Contextualized Representations for Multi-turn Response Selection


Title	Improving Matching Models with Hierarchical Contextualized Representations for Multi-turn Response Selection
Authors	Chongyang Tao, Wei Wu, Can Xu, Yansong Feng, Dongyan Zhao, Rui Yan
Abstract	In this paper, we study context-response matching with pre-trained contextualized representations for multi-turn response selection in retrieval-based chatbots. Existing models, such as Cove and ELMo, are trained with limited context (often a single sentence or paragraph), and may not work well on multi-turn conversations, due to the hierarchical nature, informal language, and domain-specific words. To address the challenges, we propose pre-training hierarchical contextualized representations, including contextual word-level and sentence-level representations, by learning a dialogue generation model from large-scale conversations with a hierarchical encoder-decoder architecture. Then the two levels of representations are blended into the input and output layer of a matching model respectively. Experimental results on two benchmark conversation datasets indicate that the proposed hierarchical contextualized representations can bring significantly and consistently improvement to existing matching models for response selection.
Tasks	Dialogue Generation
Published	2018-08-22
URL	https://arxiv.org/abs/1808.07244v2
PDF	https://arxiv.org/pdf/1808.07244v2.pdf
PWC	https://paperswithcode.com/paper/improving-matching-models-with-contextualized
Repo
Framework

Gradient Adversarial Training of Neural Networks


Title	Gradient Adversarial Training of Neural Networks
Authors	Ayan Sinha, Zhao Chen, Vijay Badrinarayanan, Andrew Rabinovich
Abstract	We propose gradient adversarial training, an auxiliary deep learning framework applicable to different machine learning problems. In gradient adversarial training, we leverage a prior belief that in many contexts, simultaneous gradient updates should be statistically indistinguishable from each other. We enforce this consistency using an auxiliary network that classifies the origin of the gradient tensor, and the main network serves as an adversary to the auxiliary network in addition to performing standard task-based training. We demonstrate gradient adversarial training for three different scenarios: (1) as a defense to adversarial examples we classify gradient tensors and tune them to be agnostic to the class of their corresponding example, (2) for knowledge distillation, we do binary classification of gradient tensors derived from the student or teacher network and tune the student gradient tensor to mimic the teacher’s gradient tensor; and (3) for multi-task learning we classify the gradient tensors derived from different task loss functions and tune them to be statistically indistinguishable. For each of the three scenarios we show the potential of gradient adversarial training procedure. Specifically, gradient adversarial training increases the robustness of a network to adversarial attacks, is able to better distill the knowledge from a teacher network to a student network compared to soft targets, and boosts multi-task learning by aligning the gradient tensors derived from the task specific loss functions. Overall, our experiments demonstrate that gradient tensors contain latent information about whatever tasks are being trained, and can support diverse machine learning problems when intelligently guided through adversarialization using a auxiliary network.
Tasks	Multi-Task Learning
Published	2018-06-21
URL	http://arxiv.org/abs/1806.08028v1
PDF	http://arxiv.org/pdf/1806.08028v1.pdf
PWC	https://paperswithcode.com/paper/gradient-adversarial-training-of-neural
Repo
Framework

Understanding the Acceleration Phenomenon via High-Resolution Differential Equations


Title	Understanding the Acceleration Phenomenon via High-Resolution Differential Equations
Authors	Bin Shi, Simon S. Du, Michael I. Jordan, Weijie J. Su
Abstract	Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms—Nesterov’s accelerated gradient method for strongly convex functions (NAG-SC) and Polyak’s heavy-ball method—we study an alternative limiting process that yields high-resolution ODEs. We show that these ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak’s heavy-ball method, but they allow the identification of a term that we refer to as “gradient correction” that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov’s accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result—that NAG-C minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-C, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG-C for smooth convex functions.
Tasks
Published	2018-10-21
URL	http://arxiv.org/abs/1810.08907v3
PDF	http://arxiv.org/pdf/1810.08907v3.pdf
PWC	https://paperswithcode.com/paper/understanding-the-acceleration-phenomenon-via
Repo
Framework

Online Cluster Validity Indices for Streaming Data


Title	Online Cluster Validity Indices for Streaming Data
Authors	Masud Moshtaghi, James C. Bezdek, Sarah M. Erfani, Christopher Leckie, James Bailey
Abstract	Cluster analysis is used to explore structure in unlabeled data sets in a wide range of applications. An important part of cluster analysis is validating the quality of computationally obtained clusters. A large number of different internal indices have been developed for validation in the offline setting. However, this concept has not been extended to the online setting. A key challenge is to find an efficient incremental formulation of an index that can capture both cohesion and separation of the clusters over potentially infinite data streams. In this paper, we develop two online versions (with and without forgetting factors) of the Xie-Beni and Davies-Bouldin internal validity indices, and analyze their characteristics, using two streaming clustering algorithms (sk-means and online ellipsoidal clustering), and illustrate their use in monitoring evolving clusters in streaming data. We also show that incremental cluster validity indices are capable of sending a distress signal to online monitors when evolving clusters go awry. Our numerical examples indicate that the incremental Xie-Beni index with forgetting factor is superior to the other three indices tested.
Tasks
Published	2018-01-08
URL	http://arxiv.org/abs/1801.02937v1
PDF	http://arxiv.org/pdf/1801.02937v1.pdf
PWC	https://paperswithcode.com/paper/online-cluster-validity-indices-for-streaming
Repo
Framework

ATM:Adversarial-neural Topic Model


Title	ATM:Adversarial-neural Topic Model
Authors	Rui Wang, Deyu Zhou, Yulan He
Abstract	Topic models are widely used for thematic structure discovery in text. But traditional topic models often require dedicated inference procedures for specific tasks at hand. Also, they are not designed to generate word-level semantic representations. To address these limitations, we propose a topic modeling approach based on Generative Adversarial Nets (GANs), called Adversarial-neural Topic Model (ATM). The proposed ATM models topics with Dirichlet prior and employs a generator network to capture the semantic patterns among latent topics. Meanwhile, the generator could also produce word-level semantic representations. To illustrate the feasibility of porting ATM to tasks other than topic modeling, we apply ATM for open domain event extraction. Our experimental results on the two public corpora show that ATM generates more coherence topics, outperforming a number of competitive baselines. Moreover, ATM is able to extract meaningful events from news articles.
Tasks	Topic Models
Published	2018-11-01
URL	https://arxiv.org/abs/1811.00265v2
PDF	https://arxiv.org/pdf/1811.00265v2.pdf
PWC	https://paperswithcode.com/paper/atmadversarial-neural-topic-model
Repo
Framework

Variational Bayesian Reinforcement Learning with Regret Bounds


Title	Variational Bayesian Reinforcement Learning with Regret Bounds
Authors	Brendan O’Donoghue
Abstract	We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with an epistemic-risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized to minimize regret, or annealed according to a schedule. We call the resulting algorithm K-learning and we show that the K-values that the agent maintains are optimistic for the expected optimal Q-values at each state-action pair. The utility function approach induces a natural Boltzmann exploration policy for which the ‘temperature’ parameter is equal to the risk-seeking parameter. This policy achieves a Bayesian regret bound of $\tilde O(L^{3/2} \sqrt{SAT})$, where L is the time horizon, S is the number of states, A is the number of actions, and T is the total number of elapsed time-steps. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.
Tasks	Q-Learning
Published	2018-07-25
URL	https://arxiv.org/abs/1807.09647v2
PDF	https://arxiv.org/pdf/1807.09647v2.pdf
PWC	https://paperswithcode.com/paper/variational-bayesian-reinforcement-learning
Repo
Framework

Abstract: Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients


Title	Abstract: Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients
Authors	Imon Banerjee, Michael Francis Gensheimer, Douglas J. Wood, Solomon Henry, Daniel Chang, Daniel L. Rubin
Abstract	We propose a deep learning model - Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) for estimating short-term life expectancy (3 months) of the patients by analyzing free-text clinical notes in the electronic medical record, while maintaining the temporal visit sequence. In a single framework, we integrated semantic data mapping and neural embedding technique to produce a text processing method that extracts relevant information from heterogeneous types of clinical notes in an unsupervised manner, and we designed a recurrent neural network to model the temporal dependency of the patient visits. The model was trained on a large dataset (10,293 patients) and validated on a separated dataset (1818 patients). Our method achieved an area under the ROC curve (AUC) of 0.89. To provide explain-ability, we developed an interactive graphical tool that may improve physician understanding of the basis for the model’s predictions. The high accuracy and explain-ability of the PPES-Met model may enable our model to be used as a decision support tool to personalize metastatic cancer treatment and provide valuable assistance to the physicians.
Tasks
Published	2018-01-09
URL	http://arxiv.org/abs/1801.03058v2
PDF	http://arxiv.org/pdf/1801.03058v2.pdf
PWC	https://paperswithcode.com/paper/abstract-probabilistic-prognostic-estimates
Repo
Framework

Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization


Title	Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization
Authors	Feihu Huang, Songcan Chen
Abstract	With the large rising of complex data, the nonconvex models such as nonconvex loss function and nonconvex regularizer are widely used in machine learning and pattern recognition. In this paper, we propose a class of mini-batch stochastic ADMMs (alternating direction method of multipliers) for solving large-scale nonconvex nonsmooth problems. We prove that, given an appropriate mini-batch size, the mini-batch stochastic ADMM without variance reduction (VR) technique is convergent and reaches a convergence rate of $O(1/T)$ to obtain a stationary point of the nonconvex optimization, where $T$ denotes the number of iterations. Moreover, we extend the mini-batch stochastic gradient method to both the nonconvex SVRG-ADMM and SAGA-ADMM proposed in our initial manuscript \cite{huang2016stochastic}, and prove these mini-batch stochastic ADMMs also reaches the convergence rate of $O(1/T)$ without condition on the mini-batch size. In particular, we provide a specific parameter selection for step size $\eta$ of stochastic gradients and penalty parameter $\rho$ of augmented Lagrangian function. Finally, extensive experimental results on both simulated and real-world data demonstrate the effectiveness of the proposed algorithms.
Tasks
Published	2018-02-08
URL	https://arxiv.org/abs/1802.03284v3
PDF	https://arxiv.org/pdf/1802.03284v3.pdf
PWC	https://paperswithcode.com/paper/mini-batch-stochastic-admms-for-nonconvex
Repo
Framework

Quantum aspects of high dimensional formal representation of conceptual spaces


Title	Quantum aspects of high dimensional formal representation of conceptual spaces
Authors	Ishwarya M S, Aswani Kumar Cherukuri
Abstract	Human cognition is a complex process facilitated by the intricate architecture of human brain. However, human cognition is often reduced to quantum theory based events in principle because of their correlative conjectures for the purpose of analysis for reciprocal understanding. In this paper, we begin our analysis of human cognition via formal methods and proceed towards quantum theories. Human cognition often violate classic probabilities on which formal representation of conceptual spaces are built. Further, geometric representation of conceptual spaces proposed by Gardenfors discusses the underlying content but lacks a systematic approach (Gardenfors, 2000; Kitto et. al, 2012). However, the aforementioned views are not contradictory but different perspective with a gap towards sufficient understanding of human cognitive process. A comprehensive and systematic approach to model a relatively complex scenario can be addressed by vector space approach of conceptual spaces as discussed in literature. In this research, we have proposed an approach that uses both formal representation and Gardenfors geometric approach. The proposed model of high dimensional formal representation of conceptual space is mathematically analysed and inferred to exhibit quantum aspects. Also, the proposed model achieves cognition, in particular, consciousness. We have demonstrated this process of achieving consciousness with a constructive learning scenario. We have also proposed an algorithm for conceptual scaling of a real world scenario under different quality dimensions to obtain a conceptual scale.
Tasks
Published	2018-06-29
URL	http://arxiv.org/abs/1806.11338v1
PDF	http://arxiv.org/pdf/1806.11338v1.pdf
PWC	https://paperswithcode.com/paper/quantum-aspects-of-high-dimensional-formal
Repo
Framework

On Latent Distributions Without Finite Mean in Generative Models


Title	On Latent Distributions Without Finite Mean in Generative Models
Authors	Damian Leśniak, Igor Sieradzki, Igor Podolak
Abstract	We investigate the properties of multidimensional probability distributions in the context of latent space prior distributions of implicit generative models. Our work revolves around the phenomena arising while decoding linear interpolations between two random latent vectors – regions of latent space in close proximity to the origin of the space are sampled causing distribution mismatch. We show that due to the Central Limit Theorem, this region is almost never sampled during the training process. As a result, linear interpolations may generate unrealistic data and their usage as a tool to check quality of the trained model is questionable. We propose to use multidimensional Cauchy distribution as the latent prior. Cauchy distribution does not satisfy the assumptions of the CLT and has a number of properties that allow it to work well in conjunction with linear interpolations. We also provide two general methods of creating non-linear interpolations that are easily applicable to a large family of common latent distributions. Finally we empirically analyze the quality of data generated from low-probability-mass regions for the DCGAN model on the CelebA dataset.
Tasks
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01670v1
PDF	http://arxiv.org/pdf/1806.01670v1.pdf
PWC	https://paperswithcode.com/paper/on-latent-distributions-without-finite-mean
Repo
Framework

Direct Sparse Visual-Inertial Odometry using Dynamic Marginalization


Title	Direct Sparse Visual-Inertial Odometry using Dynamic Marginalization
Authors	Lukas von Stumberg, Vladyslav Usenko, Daniel Cremers
Abstract	We present VI-DSO, a novel approach for visual-inertial odometry, which jointly estimates camera poses and sparse scene geometry by minimizing photometric and IMU measurement errors in a combined energy functional. The visual part of the system performs a bundle-adjustment like optimization on a sparse set of points, but unlike key-point based systems it directly minimizes a photometric error. This makes it possible for the system to track not only corners, but any pixels with large enough intensity gradients. IMU information is accumulated between several frames using measurement preintegration, and is inserted into the optimization as an additional constraint between keyframes. We explicitly include scale and gravity direction into our model and jointly optimize them together with other variables such as poses. As the scale is often not immediately observable using IMU data this allows us to initialize our visual-inertial system with an arbitrary scale instead of having to delay the initialization until everything is observable. We perform partial marginalization of old variables so that updates can be computed in a reasonable time. In order to keep the system consistent we propose a novel strategy which we call “dynamic marginalization”. This technique allows us to use partial marginalization even in cases where the initial scale estimate is far from the optimum. We evaluate our method on the challenging EuRoC dataset, showing that VI-DSO outperforms the state of the art.
Tasks
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05625v1
PDF	http://arxiv.org/pdf/1804.05625v1.pdf
PWC	https://paperswithcode.com/paper/direct-sparse-visual-inertial-odometry-using
Repo
Framework

Instance Retrieval at Fine-grained Level Using Multi-Attribute Recognition


Title	Instance Retrieval at Fine-grained Level Using Multi-Attribute Recognition
Authors	Roshanak Zakizadeh, Yu Qian, Michele Sasdelli, Eduard Vazquez
Abstract	In this paper, we present a method for instance ranking and retrieval at fine-grained level based on the global features extracted from a multi-attribute recognition model which is not dependent on landmarks information or part-based annotations. Further, we make this architecture suitable for mobile-device application by adopting the bilinear CNN to make the multi-attribute recognition model smaller (in terms of the number of parameters). The experiments run on the Dress category of DeepFashion In-Shop Clothes Retrieval and CUB200 datasets show that the results of instance retrieval at fine-grained level are promising for these datasets, specially in terms of texture and color.
Tasks
Published	2018-11-07
URL	http://arxiv.org/abs/1811.02949v1
PDF	http://arxiv.org/pdf/1811.02949v1.pdf
PWC	https://paperswithcode.com/paper/instance-retrieval-at-fine-grained-level
Repo
Framework

Theory of Machine Networks: A Case Study


Title	Theory of Machine Networks: A Case Study
Authors	Rooz Mahdavian, Richard Diehl Martinez
Abstract	We propose a simplification of the Theory-of-Mind Network architecture, which focuses on modeling complex, deterministic machines as a proxy for modeling nondeterministic, conscious entities. We then validate this architecture in the context of understanding engines, which, we argue, meet the required internal and external complexity to yield meaningful abstractions.
Tasks
Published	2018-06-26
URL	http://arxiv.org/abs/1806.09785v1
PDF	http://arxiv.org/pdf/1806.09785v1.pdf
PWC	https://paperswithcode.com/paper/theory-of-machine-networks-a-case-study
Repo
Framework

On the Universal Approximation Property and Equivalence of Stochastic Computing-based Neural Networks and Binary Neural Networks


Title	On the Universal Approximation Property and Equivalence of Stochastic Computing-based Neural Networks and Binary Neural Networks
Authors	Yanzhi Wang, Zheng Zhan, Jiayu Li, Jian Tang, Bo Yuan, Liang Zhao, Wujie Wen, Siyue Wang, Xue Lin
Abstract	Large-scale deep neural networks are both memory intensive and computation-intensive, thereby posing stringent requirements on the computing platforms. Hardware accelerations of deep neural networks have been extensively investigated in both industry and academia. Specific forms of binary neural networks (BNNs) and stochastic computing based neural networks (SCNNs) are particularly appealing to hardware implementations since they can be implemented almost entirely with binary operations. Despite the obvious advantages in hardware implementation, these approximate computing techniques are questioned by researchers in terms of accuracy and universal applicability. Also it is important to understand the relative pros and cons of SCNNs and BNNs in theory and in actual hardware implementations. In order to address these concerns, in this paper we prove that the “ideal” SCNNs and BNNs satisfy the universal approximation property with probability 1 (due to the stochastic behavior). The proof is conducted by first proving the property for SCNNs from the strong law of large numbers, and then using SCNNs as a “bridge” to prove for BNNs. Based on the universal approximation property, we further prove that SCNNs and BNNs exhibit the same energy complexity. In other words, they have the same asymptotic energy consumption with the growing of network size. We also provide a detailed analysis of the pros and cons of SCNNs and BNNs for hardware implementations and conclude that SCNNs are more suitable for hardware.
Tasks
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05391v2
PDF	http://arxiv.org/pdf/1803.05391v2.pdf
PWC	https://paperswithcode.com/paper/on-the-universal-approximation-property-and
Repo
Framework

A Differential Topological View of Challenges in Learning with Feedforward Neural Networks


Title	A Differential Topological View of Challenges in Learning with Feedforward Neural Networks
Authors	Hao Shen
Abstract	Among many unsolved puzzles in theories of Deep Neural Networks (DNNs), there are three most fundamental challenges that highly demand solutions, namely, expressibility, optimisability, and generalisability. Although there have been significant progresses in seeking answers using various theories, e.g. information bottleneck theory, sparse representation, statistical inference, Riemannian geometry, etc., so far there is no single theory that is able to provide solutions to all these challenges. In this work, we propose to engage the theory of differential topology to address the three problems. By modelling the dataset of interest as a smooth manifold, DNNs can be considered as compositions of smooth maps between smooth manifolds. Specifically, our work offers a differential topological view of loss landscape of DNNs, interplay between width and depth in expressibility, and regularisations for generalisability. Finally, in the setting of deep representation learning, we further apply the quotient topology to investigate the architecture of DNNs, which enables to capture nuisance factors in data with respect to a specific learning task.
Tasks	Representation Learning
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10304v1
PDF	http://arxiv.org/pdf/1811.10304v1.pdf
PWC	https://paperswithcode.com/paper/a-differential-topological-view-of-challenges
Repo
Framework