April 3, 2020

3181 words 15 mins read

Paper Group ANR 37

Paper Group ANR 37

Bayesian Reasoning with Deep-Learned Knowledge. Using News Articles and Financial Data to predict the likelihood of bankruptcy. Dataless Model Selection with the Deep Frame Potential. Memory-Constrained No-Regret Learning in Adversarial Bandits. Digital Collaborator: Augmenting Task Abstraction in Visualization Design with Artificial Intelligence. …

Bayesian Reasoning with Deep-Learned Knowledge

Title Bayesian Reasoning with Deep-Learned Knowledge
Authors Jakob Knollmüller, Torsten Enßlin
Abstract We access the internalized understanding of trained, deep neural networks to perform Bayesian reasoning on complex tasks. Independently trained networks are arranged to jointly answer questions outside their original scope, which are formulated in terms of a Bayesian inference problem. We solve this approximately with variational inference, which provides uncertainty on the outcomes. We demonstrate how following tasks can be approached this way: Combining independently trained networks to sample from a conditional generator, solving riddles involving multiple constraints simultaneously, and combine deep-learned knowledge with conventional noisy measurements in the context of high-resolution images of human faces.
Tasks Bayesian Inference
Published 2020-01-29
URL https://arxiv.org/abs/2001.11031v1
PDF https://arxiv.org/pdf/2001.11031v1.pdf
PWC https://paperswithcode.com/paper/bayesian-reasoning-with-deep-learned

Using News Articles and Financial Data to predict the likelihood of bankruptcy

Title Using News Articles and Financial Data to predict the likelihood of bankruptcy
Authors Michael Filletti, Aaron Grech
Abstract Over the past decade, millions of companies have filed for bankruptcy. This has been caused by a plethora of reasons, namely, high interest rates, heavy debts and government regulations. The effect of a company going bankrupt can be devastating, hurting not only workers and shareholders, but also clients, suppliers and any related external companies. One of the aims of this paper is to provide a framework for company bankruptcy to be predicted by making use of financial figures, provided by our external dataset, in conjunction with the sentiment of news articles about certain sectors. News articles are used to attempt to quantify the sentiment on a company and its sector from an external perspective, rather than simply using internal figures. This work builds on previous studies carried out by multiple researchers, to bring us closer to lessening the impact of such events.
Published 2020-03-22
URL https://arxiv.org/abs/2003.13414v1
PDF https://arxiv.org/pdf/2003.13414v1.pdf
PWC https://paperswithcode.com/paper/using-news-articles-and-financial-data-to

Dataless Model Selection with the Deep Frame Potential

Title Dataless Model Selection with the Deep Frame Potential
Authors Calvin Murdock, Simon Lucey
Abstract Choosing a deep neural network architecture is a fundamental problem in applications that require balancing performance and parameter efficiency. Standard approaches rely on ad-hoc engineering or computationally expensive validation on a specific dataset. We instead attempt to quantify networks by their intrinsic capacity for unique and robust representations, enabling efficient architecture comparisons without requiring any data. Building upon theoretical connections between deep learning and sparse approximation, we propose the deep frame potential: a measure of coherence that is approximately related to representation stability but has minimizers that depend only on network structure. This provides a framework for jointly quantifying the contributions of architectural hyper-parameters such as depth, width, and skip connections. We validate its use as a criterion for model selection and demonstrate correlation with generalization error on a variety of common residual and densely connected network architectures.
Tasks Model Selection
Published 2020-03-30
URL https://arxiv.org/abs/2003.13866v1
PDF https://arxiv.org/pdf/2003.13866v1.pdf
PWC https://paperswithcode.com/paper/dataless-model-selection-with-the-deep-frame

Memory-Constrained No-Regret Learning in Adversarial Bandits

Title Memory-Constrained No-Regret Learning in Adversarial Bandits
Authors Xiao Xu, Qing Zhao
Abstract An adversarial bandit problem with memory constraints is studied where only the statistics of a subset of arms can be stored. A hierarchical learning policy that requires only a sublinear order of memory space in terms of the number of arms is developed. Its sublinear regret orders with respect to the time horizon are established for both weak regret and shifting regret. This work appears to be the first on memory-constrained bandit problems under the adversarial setting.
Published 2020-02-26
URL https://arxiv.org/abs/2002.11804v1
PDF https://arxiv.org/pdf/2002.11804v1.pdf
PWC https://paperswithcode.com/paper/memory-constrained-no-regret-learning-in

Digital Collaborator: Augmenting Task Abstraction in Visualization Design with Artificial Intelligence

Title Digital Collaborator: Augmenting Task Abstraction in Visualization Design with Artificial Intelligence
Authors Aditeya Pandey, Yixuan Zhang, John A. Guerra-Gomez, Andrea G. Parker, Michelle A. Borkin
Abstract In the task abstraction phase of the visualization design process, including in “design studies”, a practitioner maps the observed domain goals to generalizable abstract tasks using visualization theory in order to better understand and address the users needs. We argue that this manual task abstraction process is prone to errors due to designer biases and a lack of domain background and knowledge. Under these circumstances, a collaborator can help validate and provide sanity checks to visualization practitioners during this important task abstraction stage. However, having a human collaborator is not always feasible and may be subject to the same biases and pitfalls. In this paper, we first describe the challenges associated with task abstraction. We then propose a conceptual Digital Collaborator: an artificial intelligence system that aims to help visualization practitioners by augmenting their ability to validate and reason about the output of task abstraction. We also discuss several practical design challenges of designing and implementing such systems
Published 2020-03-03
URL https://arxiv.org/abs/2003.01304v1
PDF https://arxiv.org/pdf/2003.01304v1.pdf
PWC https://paperswithcode.com/paper/digital-collaborator-augmenting-task

Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning

Title Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning
Authors Chaoyue Liu, Libin Zhu, Mikhail Belkin
Abstract The success of deep learning is due, to a great extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. In this work we isolate some general mathematical structures allowing for efficient optimization in over-parameterized systems of non-linear equations, a setting that includes deep neural networks. In particular, we show that optimization problems corresponding to such systems are not convex, even locally, but instead satisfy the Polyak-Lojasiewicz (PL) condition allowing for efficient optimization by gradient descent or SGD. We connect the PL condition of these systems to the condition number associated to the tangent kernel and develop a non-linear theory parallel to classical analyses of over-parameterized linear equations. We discuss how these ideas apply to training shallow and deep neural networks. Finally, we point out that tangent kernels associated to certain large system may be far from constant, even locally. Yet, our analysis still allows to demonstrate existence of solutions and convergence of gradient descent and SGD.
Published 2020-02-29
URL https://arxiv.org/abs/2003.00307v1
PDF https://arxiv.org/pdf/2003.00307v1.pdf
PWC https://paperswithcode.com/paper/toward-a-theory-of-optimization-for-over

Heat and Blur: An Effective and Fast Defense Against Adversarial Examples

Title Heat and Blur: An Effective and Fast Defense Against Adversarial Examples
Authors Haya Brama, Tal Grinshpoun
Abstract The growing incorporation of artificial neural networks (NNs) into many fields, and especially into life-critical systems, is restrained by their vulnerability to adversarial examples (AEs). Some existing defense methods can increase NNs’ robustness, but they often require special architecture or training procedures and are irrelevant to already trained models. In this paper, we propose a simple defense that combines feature visualization with input modification, and can, therefore, be applicable to various pre-trained networks. By reviewing several interpretability methods, we gain new insights regarding the influence of AEs on NNs’ computation. Based on that, we hypothesize that information about the “true” object is preserved within the NN’s activity, even when the input is adversarial, and present a feature visualization version that can extract that information in the form of relevance heatmaps. We then use these heatmaps as a basis for our defense, in which the adversarial effects are corrupted by massive blurring. We also provide a new evaluation metric that can capture the effects of both attacks and defenses more thoroughly and descriptively, and demonstrate the effectiveness of the defense and the utility of the suggested evaluation measurement with VGG19 results on the ImageNet dataset.
Published 2020-03-17
URL https://arxiv.org/abs/2003.07573v1
PDF https://arxiv.org/pdf/2003.07573v1.pdf
PWC https://paperswithcode.com/paper/heat-and-blur-an-effective-and-fast-defense

Learning Parities with Neural Networks

Title Learning Parities with Neural Networks
Authors Amit Daniely, Eran Malach
Abstract In recent years we see a rapidly growing line of research which shows learnability of various models via common neural network algorithms. Yet, besides a very few outliers, these results show learnability of models that can be learned using linear methods. Namely, such results show that learning neural-networks with gradient-descent is competitive with learning a linear classifier on top of a data-independent representation of the examples. This leaves much to be desired, as neural networks are far more successful than linear methods. Furthermore, on the more conceptual level, linear models don’t seem to capture the ``deepness” of deep networks. In this paper we make a step towards showing leanability of models that are inherently non-linear. We show that under certain distributions, sparse parities are learnable via gradient decent on depth-two network. On the other hand, under the same distributions, these parities cannot be learned efficiently by linear methods. |
Published 2020-02-18
URL https://arxiv.org/abs/2002.07400v1
PDF https://arxiv.org/pdf/2002.07400v1.pdf
PWC https://paperswithcode.com/paper/learning-parities-with-neural-networks

Initial Design Strategies and their Effects on Sequential Model-Based Optimization

Title Initial Design Strategies and their Effects on Sequential Model-Based Optimization
Authors Jakob Bossek, Carola Doerr, Pascal Kerschke
Abstract Sequential model-based optimization (SMBO) approaches are algorithms for solving problems that require computationally or otherwise expensive function evaluations. The key design principle of SMBO is a substitution of the true objective function by a surrogate, which is used to propose the point(s) to be evaluated next. SMBO algorithms are intrinsically modular, leaving the user with many important design choices. Significant research efforts go into understanding which settings perform best for which type of problems. Most works, however, focus on the choice of the model, the acquisition function, and the strategy used to optimize the latter. The choice of the initial sampling strategy, however, receives much less attention. Not surprisingly, quite diverging recommendations can be found in the literature. We analyze in this work how the size and the distribution of the initial sample influences the overall quality of the efficient global optimization~(EGO) algorithm, a well-known SMBO approach. While, overall, small initial budgets using Halton sampling seem preferable, we also observe that the performance landscape is rather unstructured. We furthermore identify several situations in which EGO performs unfavorably against random sampling. Both observations indicate that an adaptive SMBO design could be beneficial, making SMBO an interesting test-bed for automated algorithm design.
Published 2020-03-30
URL https://arxiv.org/abs/2003.13826v1
PDF https://arxiv.org/pdf/2003.13826v1.pdf
PWC https://paperswithcode.com/paper/initial-design-strategies-and-their-effects

Anytime and Efficient Coalition Formation with Spatial and Temporal Constraints

Title Anytime and Efficient Coalition Formation with Spatial and Temporal Constraints
Authors Luca Capezzuto, Danesh Tarapore, Sarvapali D. Ramchurn
Abstract The Coalition Formation with Spatial and Temporal constraints Problem (CFSTP) is a multi-agent task allocation problem where the agents are cooperative and few, the tasks are many, spatially distributed, with deadlines and workloads, and the objective is to find a schedule that maximises the number of completed tasks. The current state-of-the-art CFSTP solver, the Coalition Formation with Look-Ahead (CFLA) algorithm, has two main limitations. First, its time complexity is quadratic with the number of tasks and exponential with the number of agents, which makes it not efficient. Second, its look-ahead technique is not effective in real-world scenarios, such as open multi-agent systems, where new tasks can appear at any time. Motivated by this, we propose an extension of CFLA, which we call Coalition Formation with Improved Look-Ahead (CFLA+). Since CFLA+ inherits the limitations of CFLA, we also develop a novel algorithm to solve the CFSTP, the first to be both anytime and efficient, which we call Clustered-based Coalition Formation (CCF). We empirically show that, in settings where the look-ahead technique is highly effective, CCF completes up to 20% (resp. 10%) more tasks than CFLA (resp. CFLA+) while being up to four orders of magnitude faster. Our results affirm CCF as the new state-of-the-art CFSTP solver.
Published 2020-03-30
URL https://arxiv.org/abs/2003.13806v1
PDF https://arxiv.org/pdf/2003.13806v1.pdf
PWC https://paperswithcode.com/paper/anytime-and-efficient-coalition-formation

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units

Title Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units
Authors Zhanzhan Cheng, Yunlu Xu, Mingjian Cheng, Yu Qiao, Shiliang Pu, Yi Niu, Fei Wu
Abstract Recurrent neural network (RNN) has been widely studied in sequence learning tasks, while the mainstream models (e.g., LSTM and GRU) rely on the gating mechanism (in control of how information flows between hidden states). However, the vanilla gates in RNN (e.g. the input gate in LSTM) suffer from the problem of gate undertraining mainly due to the saturating activation functions, which may result in failures of learning gating roles and thus the weak performance. In this paper, we propose a new gating mechanism within general gated recurrent neural networks to handle this issue. Specifically, the proposed gates directly short connect the extracted input features to the outputs of vanilla gates, denoted as refined gates. The refining mechanism allows enhancing gradient back-propagation as well as extending the gating activation scope, which, although simple, can guide RNN to reach possibly deeper minima. We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU. Extensive experiments on 3 synthetic tasks, 3 language modeling tasks and 5 scene text recognition benchmarks demonstrate the effectiveness of our method.
Tasks Language Modelling, Scene Text Recognition
Published 2020-02-26
URL https://arxiv.org/abs/2002.11338v1
PDF https://arxiv.org/pdf/2002.11338v1.pdf
PWC https://paperswithcode.com/paper/refined-gate-a-simple-and-effective-gating

Deep Attentive Study Session Dropout Prediction in Mobile Learning Environment

Title Deep Attentive Study Session Dropout Prediction in Mobile Learning Environment
Authors Youngnam Lee, Dongmin Shin, HyunBin Loh, Jaemin Lee, Piljae Chae, Junghyun Cho, Seoyon Park, Jinhwan Lee, Jineon Baek, Byungsoo Kim, Youngduck Choi
Abstract Student dropout prediction provides an opportunity to improve student engagement, which maximizes the overall effectiveness of learning experiences. However, researches on student dropout were mainly conducted on school dropout or course dropout, and study session dropout in a mobile learning environment has not been considered thoroughly. In this paper, we investigate the study session dropout prediction problem in a mobile learning environment. First, we define the concept of the study session, study session dropout and study session dropout prediction task in a mobile learning environment. Based on the definitions, we propose a novel Transformer based model for predicting study session dropout, DAS: Deep Attentive Study Session Dropout Prediction in Mobile Learning Environment. DAS has an encoder-decoder structure which is composed of stacked multi-head attention and point-wise feed-forward networks. The deep attentive computations in DAS are capable of capturing complex relations among dynamic student interactions. To the best of our knowledge, this is the first attempt to investigate study session dropout in a mobile learning environment. Empirical evaluations on a large-scale dataset show that DAS achieves the best performance with a significant improvement in area under the receiver operating characteristic curve compared to baseline models.
Published 2020-02-14
URL https://arxiv.org/abs/2002.11624v1
PDF https://arxiv.org/pdf/2002.11624v1.pdf
PWC https://paperswithcode.com/paper/deep-attentive-study-session-dropout

Deep Collaborative Embedding for information cascade prediction

Title Deep Collaborative Embedding for information cascade prediction
Authors Yuhui Zhao, Ning Yang, Tao Lin, Philip S. Yu
Abstract Recently, information cascade prediction has attracted increasing interest from researchers, but it is far from being well solved partly due to the three defects of the existing works. First, the existing works often assume an underlying information diffusion model, which is impractical in real world due to the complexity of information diffusion. Second, the existing works often ignore the prediction of the infection order, which also plays an important role in social network analysis. At last, the existing works often depend on the requirement of underlying diffusion networks which are likely unobservable in practice. In this paper, we aim at the prediction of both node infection and infection order without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network, where the challenges are two-fold. The first is what cascading characteristics of nodes should be captured and how to capture them, and the second is that how to model the non-linear features of nodes in information cascades. To address these challenges, we propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction, which can capture not only the node structural property but also two kinds of node cascading characteristics. We propose an auto-encoder based collaborative embedding framework to learn the node embeddings with cascade collaboration and node collaboration, in which way the non-linearity of information cascades can be effectively captured. The results of extensive experiments conducted on real-world datasets verify the effectiveness of our approach.
Published 2020-01-18
URL https://arxiv.org/abs/2001.06665v1
PDF https://arxiv.org/pdf/2001.06665v1.pdf
PWC https://paperswithcode.com/paper/deep-collaborative-embedding-for-information

Estimating Basis Functions in Massive Fields under the Spatial Mixed Effects Model

Title Estimating Basis Functions in Massive Fields under the Spatial Mixed Effects Model
Authors Karl T. Pazdernik, Ranjan Maitra
Abstract Spatial prediction is commonly achieved under the assumption of a Gaussian random field (GRF) by obtaining maximum likelihood estimates of parameters, and then using the kriging equations to arrive at predicted values. For massive datasets, fixed rank kriging using the Expectation-Maximization (EM) algorithm for estimation has been proposed as an alternative to the usual but computationally prohibitive kriging method. The method reduces computation cost of estimation by redefining the spatial process as a linear combination of basis functions and spatial random effects. A disadvantage of this method is that it imposes constraints on the relationship between the observed locations and the knots. We develop an alternative method that utilizes the Spatial Mixed Effects (SME) model, but allows for additional flexibility by estimating the range of the spatial dependence between the observations and the knots via an Alternating Expectation Conditional Maximization (AECM) algorithm. Experiments show that our methodology improves estimation without sacrificing prediction accuracy while also minimizing the additional computational burden of extra parameter estimation. The methodology is applied to a temperature data set archived by the United States National Climate Data Center, with improved results over previous methodology.
Published 2020-03-12
URL https://arxiv.org/abs/2003.05990v1
PDF https://arxiv.org/pdf/2003.05990v1.pdf
PWC https://paperswithcode.com/paper/estimating-basis-functions-in-massive-fields

ResNets, NeuralODEs and CT-RNNs are Particular Neural Regulatory Networks

Title ResNets, NeuralODEs and CT-RNNs are Particular Neural Regulatory Networks
Authors Radu Grosu
Abstract This paper shows that ResNets, NeuralODEs, and CT-RNNs, are particular neural regulatory networks (NRNs), a biophysical model for the nonspiking neurons encountered in small species, such as the C.elegans nematode, and in the retina of large species. Compared to ResNets, NeuralODEs and CT-RNNs, NRNs have an additional multiplicative term in their synaptic computation, allowing them to adapt to each particular input. This additional flexibility makes NRNs $M$ times more succinct than NeuralODEs and CT-RNNs, where $M$ is proportional to the size of the training set. Moreover, as NeuralODEs and CT-RNNs are $N$ times more succinct than ResNets, where $N$ is the number of integration steps required to compute the output $F(x)$ for a given input $x$, NRNs are in total $M,{\cdot},N$ more succinct than ResNets. For a given approximation task, this considerable succinctness allows to learn a very small and therefore understandable NRN, whose behavior can be explained in terms of well established architectural motifs, that NRNs share with gene regulatory networks, such as, activation, inhibition, sequentialization, mutual exclusion, and synchronization. To the best of our knowledge, this paper unifies for the first time the mainstream work on deep neural networks with the one in biology and neuroscience in a quantitative fashion.
Published 2020-02-26
URL https://arxiv.org/abs/2002.12776v3
PDF https://arxiv.org/pdf/2002.12776v3.pdf
PWC https://paperswithcode.com/paper/resnets-neuralodes-and-ct-rnns-are-particular
comments powered by Disqus