January 26, 2020

3031 words 15 mins read

Paper Group ANR 1424

Paper Group ANR 1424

What Can ResNet Learn Efficiently, Going Beyond Kernels?. IRF: Interactive Recommendation through Dialogue. Parallelized Training of Restricted Boltzmann Machines using Markov-Chain Monte Carlo Methods. It’s Not Whom You Know, It’s What You (or Your Friends) Can Do: Succint Coalitional Frameworks for Network Centralities. Parallelising MCMC via Ran …

What Can ResNet Learn Efficiently, Going Beyond Kernels?

Title What Can ResNet Learn Efficiently, Going Beyond Kernels?
Authors Zeyuan Allen-Zhu, Yuanzhi Li
Abstract How can neural networks such as ResNet efficiently learn CIFAR-10 with test accuracy more than 96%, while other methods, especially kernel methods, fall relatively behind? Can we more provide theoretical justifications for this gap? Recently, there is an influential line of work relating neural networks to kernels in the over-parameterized regime, proving they can learn certain concept class that is also learnable by kernels with similar test error. Yet, can neural networks provably learn some concept class $\textit{better}$ than kernels? We answer this positively in the PAC-learning language. We prove neural networks can efficiently learn a notable class of functions, including those defined by three-layer residual networks with smooth activations, without any distributional assumption. At the same time, we prove there are simple functions in this class such that with the same number of training examples, the test error obtained by neural networks can be $\textit{much smaller}$ than $\textit{any}$ kernel method, including neural tangent kernels (NTK). The main intuition is that multi-layer neural networks can implicitly perform hierarchal learning using different layers, which reduces the sample complexity comparing to ``one-shot’’ learning algorithms such as kernel methods. In the end, we also prove a computation complexity advantage of ResNet with respect to other learning methods including linear regression over arbitrary feature mappings. |
Tasks One-Shot Learning
Published 2019-05-24
URL https://arxiv.org/abs/1905.10337v2
PDF https://arxiv.org/pdf/1905.10337v2.pdf
PWC https://paperswithcode.com/paper/what-can-resnet-learn-efficiently-going
Repo
Framework

IRF: Interactive Recommendation through Dialogue

Title IRF: Interactive Recommendation through Dialogue
Authors Oznur Alkan, Massimiliano Mattetti, Elizabeth M. Daly, Adi Botea, Inge Vejsbjerg
Abstract Recent research focuses beyond recommendation accuracy, towards human factors that influence the acceptance of recommendations, such as user satisfaction, trust, transparency and sense of control.We present a generic interactive recommender framework that can add interaction functionalities to non-interactive recommender systems.We take advantage of dialogue systems to interact with the user and we design a middleware layer to provide the interaction functions, such as providing explanations for the recommendations, managing users preferences learnt from dialogue, preference elicitation and refining recommendations based on learnt preferences.
Tasks
Published 2019-10-03
URL https://arxiv.org/abs/1910.03040v1
PDF https://arxiv.org/pdf/1910.03040v1.pdf
PWC https://paperswithcode.com/paper/irf-interactive-recommendation-through
Repo
Framework

Parallelized Training of Restricted Boltzmann Machines using Markov-Chain Monte Carlo Methods

Title Parallelized Training of Restricted Boltzmann Machines using Markov-Chain Monte Carlo Methods
Authors Pei Yang, Srinivas Varadharajan, Lucas A. Wilson, Don D. Smith II, John A Lockman III, Vineet Gundecha, Quy Ta
Abstract Restricted Boltzmann Machine (RBM) is a generative stochastic neural network that can be applied to collaborative filtering technique used by recommendation systems. Prediction accuracy of the RBM model is usually better than that of other models for recommendation systems. However, training the RBM model involves Markov-Chain Monte Carlo (MCMC) method, which is computationally expensive. In this paper, we have successfully applied distributed parallel training using Horovod framework to improve the training time of the RBM model. Our tests show that the distributed training approach of the RBM model has a good scaling efficiency. We also show that this approach effectively reduces the training time to little over 12 minutes on 64 CPU nodes compared to 5 hours on a single CPU node. This will make RBM models more practically applicable in recommendation systems.
Tasks Recommendation Systems
Published 2019-10-14
URL https://arxiv.org/abs/1910.05885v1
PDF https://arxiv.org/pdf/1910.05885v1.pdf
PWC https://paperswithcode.com/paper/parallelized-training-of-restricted-boltzmann
Repo
Framework

It’s Not Whom You Know, It’s What You (or Your Friends) Can Do: Succint Coalitional Frameworks for Network Centralities

Title It’s Not Whom You Know, It’s What You (or Your Friends) Can Do: Succint Coalitional Frameworks for Network Centralities
Authors Gabriel Istrate, Cosmin Bonchis, Claudiu Gatina
Abstract We investigate the representation of measures of network centrality using a framework that blends a social network representation with the succint formalism of cooperative skill games. We discuss the expressiveness of the new framework and highlight some of its advantages, including a fixed-parameter tractability result for computing centrality measures under such representations. As an application we introduce new network centrality measures that capture the extent to which neighbors of a certain node can help it complete relevant tasks.
Tasks
Published 2019-09-24
URL https://arxiv.org/abs/1909.11084v1
PDF https://arxiv.org/pdf/1909.11084v1.pdf
PWC https://paperswithcode.com/paper/its-not-whom-you-know-its-what-you-or-your
Repo
Framework

Parallelising MCMC via Random Forests

Title Parallelising MCMC via Random Forests
Authors Wu Changye, Christian P. Robert
Abstract For Bayesian computation in big data contexts, the divide-and-conquer MCMC concept splits the whole data set into batches, runs MCMC algorithms separately over each batch to produce samples of parameters, and combines them to produce an approximation of the target distribution. In this article, we embed random forests into this framework and use each subposterior/partial-posterior as a proposal distribution to implement importance sampling. Unlike the existing divide-and-conquer MCMC, our methods are based on scaled subposteriors, whose scale factors are not necessarily restricted to being equal to one or to the number of subsets. Through several experiments, we show that our methods work well with models ranging from Gaussian cases to strongly non-Gaussian cases, and include model misspecification.
Tasks
Published 2019-11-21
URL https://arxiv.org/abs/1911.09698v1
PDF https://arxiv.org/pdf/1911.09698v1.pdf
PWC https://paperswithcode.com/paper/parallelising-mcmc-via-random-forests
Repo
Framework

Deep Active Learning: Unified and Principled Method for Query and Training

Title Deep Active Learning: Unified and Principled Method for Query and Training
Authors Changjian Shui, Fan Zhou, Christian Gagné, Boyu Wang
Abstract In this paper, we are proposing a unified and principled method for both the querying and training processes in deep batch active learning. We are providing theoretical insights from the intuition of modeling the interactive procedure in active learning as distribution matching, by adopting the Wasserstein distance. As a consequence, we derived a new training loss from the theoretical analysis, which is decomposed into optimizing deep neural network parameters and batch query selection through alternative optimization. In addition, the loss for training a deep neural network is naturally formulated as a min-max optimization problem through leveraging the unlabeled data information. Moreover, the proposed principles also indicate an explicit uncertainty-diversity trade-off in the query batch selection. Finally, we evaluate our proposed method on different benchmarks, consistently showing better empirical performances and a better time-efficient query strategy compared to the baselines.
Tasks Active Learning
Published 2019-11-20
URL https://arxiv.org/abs/1911.09162v2
PDF https://arxiv.org/pdf/1911.09162v2.pdf
PWC https://paperswithcode.com/paper/deep-active-learning-unified-and-principled
Repo
Framework

The Effectiveness of Variational Autoencoders for Active Learning

Title The Effectiveness of Variational Autoencoders for Active Learning
Authors Farhad Pourkamali-Anaraki, Michael B. Wakin
Abstract The high cost of acquiring labels is one of the main challenges in deploying supervised machine learning algorithms. Active learning is a promising approach to control the learning process and address the difficulties of data labeling by selecting labeled training examples from a large pool of unlabeled instances. In this paper, we propose a new data-driven approach to active learning by choosing a small set of labeled data points that are both informative and representative. To this end, we present an efficient geometric technique to select a diverse core-set in a low-dimensional latent space obtained by training a Variational Autoencoder (VAE). Our experiments demonstrate an improvement in accuracy over two related techniques and, more importantly, signify the representation power of generative modeling for developing new active learning methods in high-dimensional data settings.
Tasks Active Learning
Published 2019-11-18
URL https://arxiv.org/abs/1911.07716v1
PDF https://arxiv.org/pdf/1911.07716v1.pdf
PWC https://paperswithcode.com/paper/the-effectiveness-of-variational-autoencoders
Repo
Framework
Title Learning for Detection: MIMO-OFDM Symbol Detection through Downlink Pilots
Authors Zhou Zhou, Lingjia Liu, Hao-Hsuan Chang
Abstract Reservoir computing (RC) is a special recurrent neural network which consists of a fixed high dimensional feature mapping and trained readout weights. In this paper, we introduce a new RC structure for multiple-input, multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) symbol detection, namely windowed echo state network (WESN). The theoretical analysis shows that adding buffers in input layers can bring an enhanced short-term memory (STM) to the underlying neural network. Furthermore, a unified training framework is developed for the WESN MIMO-OFDM symbol detector using both comb and scattered pilot patterns that are compatible with the structure adopted in 3GPP LTE/LTE-Advanced systems. Complexity analysis suggests the advantages of WESN based symbol detector over state-of-the-art symbol detectors such as the linear minimum mean square error (LMMSE) detection and the sphere decoder, when the system is employed with a large number of OFDM sub-carriers. Numerical evaluations illustrate the advantage of the introduced WESN-based symbol detector and demonstrate that the improvement of STM can significantly improve symbol detection performance as well as effectively mitigate model mismatch effects compared to existing methods.
Tasks
Published 2019-06-25
URL https://arxiv.org/abs/1907.01516v2
PDF https://arxiv.org/pdf/1907.01516v2.pdf
PWC https://paperswithcode.com/paper/learn-to-demodulate-mimo-ofdm-symbol
Repo
Framework

Dynamic Scale Inference by Entropy Minimization

Title Dynamic Scale Inference by Entropy Minimization
Authors Dequan Wang, Evan Shelhamer, Bruno Olshausen, Trevor Darrell
Abstract Given the variety of the visual world there is not one true scale for recognition: objects may appear at drastically different sizes across the visual field. Rather than enumerate variations across filter channels or pyramid levels, dynamic models locally predict scale and adapt receptive fields accordingly. The degree of variation and diversity of inputs makes this a difficult task. Existing methods either learn a feedforward predictor, which is not itself totally immune to the scale variation it is meant to counter, or select scales by a fixed algorithm, which cannot learn from the given task and data. We extend dynamic scale inference from feedforward prediction to iterative optimization for further adaptivity. We propose a novel entropy minimization objective for inference and optimize over task and structure parameters to tune the model to each input. Optimization during inference improves semantic segmentation accuracy and generalizes better to extreme scale variations that cause feedforward dynamic inference to falter.
Tasks Semantic Segmentation
Published 2019-08-08
URL https://arxiv.org/abs/1908.03182v1
PDF https://arxiv.org/pdf/1908.03182v1.pdf
PWC https://paperswithcode.com/paper/dynamic-scale-inference-by-entropy
Repo
Framework

An Entity-Driven Framework for Abstractive Summarization

Title An Entity-Driven Framework for Abstractive Summarization
Authors Eva Sharma, Luyang Huang, Zhe Hu, Lu Wang
Abstract Abstractive summarization systems aim to produce more coherent and concise summaries than their extractive counterparts. Popular neural models have achieved impressive results for single-document summarization, yet their outputs are often incoherent and unfaithful to the input. In this paper, we introduce SENECA, a novel System for ENtity-drivEn Coherent Abstractive summarization framework that leverages entity information to generate informative and coherent abstracts. Our framework takes a two-step approach: (1) an entity-aware content selection module first identifies salient sentences from the input, then (2) an abstract generation module conducts cross-sentence information compression and abstraction to generate the final summary, which is trained with rewards to promote coherence, conciseness, and clarity. The two components are further connected using reinforcement learning. Automatic evaluation shows that our model significantly outperforms previous state-of-the-art on ROUGE and our proposed coherence measures on New York Times and CNN/Daily Mail datasets. Human judges further rate our system summaries as more informative and coherent than those by popular summarization models.
Tasks Abstractive Text Summarization, Document Summarization
Published 2019-09-04
URL https://arxiv.org/abs/1909.02059v1
PDF https://arxiv.org/pdf/1909.02059v1.pdf
PWC https://paperswithcode.com/paper/an-entity-driven-framework-for-abstractive
Repo
Framework

Bias-Aware Heapified Policy for Active Learning

Title Bias-Aware Heapified Policy for Active Learning
Authors Wen-Yen Chang, Wen-Huan Chiang, Shao-Hao Lu, Tingfan Wu, Min Sun
Abstract The data efficiency of learning-based algorithms is more and more important since high-quality and clean data is expensive as well as hard to collect. In order to achieve high model performance with the least number of samples, active learning is a technique that queries the most important subset of data from the original dataset. In active learning domain, one of the mainstream research is the heuristic uncertainty-based method which is useful for the learning-based system. Recently, a few works propose to apply policy reinforcement learning (PRL) for querying important data. It seems more general than heuristic uncertainty-based method owing that PRL method depends on data feature which is reliable than human prior. However, there have two problems - sample inefficiency of policy learning and overconfidence, when applying PRL on active learning. To be more precise, sample inefficiency of policy learning occurs when sampling within a large action space, in the meanwhile, class imbalance can lead to the overconfidence. In this paper, we propose a bias-aware policy network called Heapified Active Learning (HAL), which prevents overconfidence, and improves sample efficiency of policy learning by heapified structure without ignoring global inforamtion(overview of the whole unlabeled set). In our experiment, HAL outperforms other baseline methods on MNIST dataset and duplicated MNIST. Last but not least, we investigate the generalization of the HAL policy learned on MNIST dataset by directly applying it on MNIST-M. We show that the agent can generalize and outperform directly-learned policy under constrained labeled sets.
Tasks Active Learning
Published 2019-11-18
URL https://arxiv.org/abs/1911.07574v1
PDF https://arxiv.org/pdf/1911.07574v1.pdf
PWC https://paperswithcode.com/paper/bias-aware-heapified-policy-for-active
Repo
Framework

Online Adaptive Asymmetric Active Learning with Limited Budgets

Title Online Adaptive Asymmetric Active Learning with Limited Budgets
Authors Yifan Zhang, Peilin Zhao, Shuaicheng Niu, Qingyao Wu, Jiezhang Cao, Junzhou Huang, Mingkui Tan
Abstract Online Active Learning (OAL) aims to manage unlabeled datastream by selectively querying the label of data. OAL is applicable to many real-world problems, such as anomaly detection in health-care and finance. In these problems, there are two key challenges: the query budget is often limited; the ratio between classes is highly imbalanced. In practice, it is quite difficult to handle imbalanced unlabeled datastream when only a limited budget of labels can be queried for training. To solve this, previous OAL studies adopt either asymmetric losses or queries (an isolated asymmetric strategy) to tackle the imbalance, and use first-order methods to optimize the cost-sensitive measure. However, the isolated strategy limits their performance in class imbalance, while first-order methods restrict their optimization performance. In this paper, we propose a novel Online Adaptive Asymmetric Active learning algorithm, based on a new asymmetric strategy (merging both asymmetric losses and queries strategies), and second-order optimization. We theoretically analyze its mistake bound and cost-sensitive metric bounds. Moreover, to better balance performance and efficiency, we enhance our algorithm via a sketching technique, which significantly accelerates the computational speed with quite slight performance degradation. Promising results demonstrate the effectiveness and efficiency of the proposed methods.
Tasks Active Learning, Anomaly Detection
Published 2019-11-18
URL https://arxiv.org/abs/1911.07498v1
PDF https://arxiv.org/pdf/1911.07498v1.pdf
PWC https://paperswithcode.com/paper/online-adaptive-asymmetric-active-learning
Repo
Framework

Robust Multi-agent Counterfactual Prediction

Title Robust Multi-agent Counterfactual Prediction
Authors Alexander Peysakhovich, Christian Kroer, Adam Lerer
Abstract We consider the problem of using logged data to make predictions about what would happen if we changed the `rules of the game’ in a multi-agent system. This task is difficult because in many cases we observe actions individuals take but not their private information or their full reward functions. In addition, agents are strategic, so when the rules change, they will also change their actions. Existing methods (e.g. structural estimation, inverse reinforcement learning) make counterfactual predictions by constructing a model of the game, adding the assumption that agents’ behavior comes from optimizing given some goals, and then inverting observed actions to learn agent’s underlying utility function (a.k.a. type). Once the agent types are known, making counterfactual predictions amounts to solving for the equilibrium of the counterfactual environment. This approach imposes heavy assumptions such as rationality of the agents being observed, correctness of the analyst’s model of the environment/parametric form of the agents’ utility functions, and various other conditions to make point identification possible. We propose a method for analyzing the sensitivity of counterfactual conclusions to violations of these assumptions. We refer to this method as robust multi-agent counterfactual prediction (RMAC). We apply our technique to investigating the robustness of counterfactual claims for classic environments in market design: auctions, school choice, and social choice. Importantly, we show RMAC can be used in regimes where point identification is impossible (e.g. those which have multiple equilibria or non-injective maps from type distributions to outcomes). |
Tasks
Published 2019-04-03
URL http://arxiv.org/abs/1904.02235v1
PDF http://arxiv.org/pdf/1904.02235v1.pdf
PWC https://paperswithcode.com/paper/robust-multi-agent-counterfactual-prediction
Repo
Framework

A Research Agenda: Dynamic Models to Defend Against Correlated Attacks

Title A Research Agenda: Dynamic Models to Defend Against Correlated Attacks
Authors Ian Goodfellow
Abstract In this article I describe a research agenda for securing machine learning models against adversarial inputs at test time. This article does not present results but instead shares some of my thoughts about where I think that the field needs to go. Modern machine learning works very well on I.I.D. data: data for which each example is drawn {\em independently} and for which the distribution generating each example is {\em identical}. When these assumptions are relaxed, modern machine learning can perform very poorly. When machine learning is used in contexts where security is a concern, it is desirable to design models that perform well even when the input is designed by a malicious adversary. So far most research in this direction has focused on an adversary who violates the {\em identical} assumption, and imposes some kind of restricted worst-case distribution shift. I argue that machine learning security researchers should also address the problem of relaxing the {\em independence} assumption and that current strategies designed for robustness to distribution shift will not do so. I recommend {\em dynamic models} that change each time they are run as a potential solution path to this problem, and show an example of a simple attack using correlated data that can be mitigated by a simple dynamic defense. This is not intended as a real-world security measure, but as a recommendation to explore this research direction and develop more realistic defenses.
Tasks
Published 2019-03-14
URL http://arxiv.org/abs/1903.06293v1
PDF http://arxiv.org/pdf/1903.06293v1.pdf
PWC https://paperswithcode.com/paper/a-research-agenda-dynamic-models-to-defend
Repo
Framework

Towards a Characterization of Explainable Systems

Title Towards a Characterization of Explainable Systems
Authors Dimitri Bohlender, Maximilian A. Köhl
Abstract Building software-driven systems that are easily understood becomes a challenge, with their ever-increasing complexity and autonomy. Accordingly, recent research efforts strive to aid in designing explainable systems. Nevertheless, a common notion of what it takes for a system to be explainable is still missing. To address this problem, we propose a characterization of explainable systems that consolidates existing research. By providing a unified terminology, we lay a basis for the classification of both existing and future research, and the formulation of precise requirements towards such systems.
Tasks
Published 2019-01-31
URL http://arxiv.org/abs/1902.03096v1
PDF http://arxiv.org/pdf/1902.03096v1.pdf
PWC https://paperswithcode.com/paper/towards-a-characterization-of-explainable
Repo
Framework
comments powered by Disqus