Paper Group ANR 912
Item Response Theory based Ensemble in Machine Learning. Critical Point Finding with Newton-MR by Analogy to Computing Square Roots. An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis. Learning to grow: control of materials self-assembly using evolutionary reinforcement learning. …
Item Response Theory based Ensemble in Machine Learning
Title | Item Response Theory based Ensemble in Machine Learning |
Authors | Ziheng Chen, Hongshik Ahn |
Abstract | In this article, we propose a novel probabilistic framework to improve the accuracy of a weighted majority voting algorithm. In order to assign higher weights to the classifiers which can correctly classify hard-to-classify instances, we introduce the Item Response Theory (IRT) framework to evaluate the samples’ difficulty and classifiers’ ability simultaneously. Three models are created with different assumptions suitable for different cases. When making an inference, we keep a balance between the accuracy and complexity. In our experiment, all the base models are constructed by single trees via bootstrap. To explain the models, we illustrate how the IRT ensemble model constructs the classifying boundary. We also compare their performance with other widely used methods and show that our model performs well on 19 datasets. |
Tasks | |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04616v1 |
https://arxiv.org/pdf/1911.04616v1.pdf | |
PWC | https://paperswithcode.com/paper/item-response-theory-based-ensemble-in |
Repo | |
Framework | |
Critical Point Finding with Newton-MR by Analogy to Computing Square Roots
Title | Critical Point Finding with Newton-MR by Analogy to Computing Square Roots |
Authors | Charles G Frye |
Abstract | Understanding of the behavior of algorithms for resolving the optimization problem (hereafter shortened to OP) of optimizing a differentiable loss function (OP1), is enhanced by knowledge of the critical points of that loss function, i.e. the points where the gradient is 0. Here, we describe a solution to the problem of finding critical points by proposing and solving three optimization problems: 1) minimizing the norm of the gradient (OP2), 2) minimizing the difference between the pre-conditioned update direction and the gradient (OP3), and 3) minimizing the norm of the gradient along the update direction (OP4). The result is a recently-introduced algorithm for optimizing invex functions, Newton-MR, which turns out to be highly effective at the problem of finding the critical points of the loss surfaces of neural networks. We precede this derivation with an analogous, but simpler, derivation of the nested-optimization algorithm for computing square roots by combining Heron’s Method with Newton-Raphson division. |
Tasks | |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05273v1 |
https://arxiv.org/pdf/1906.05273v1.pdf | |
PWC | https://paperswithcode.com/paper/critical-point-finding-with-newton-mr-by |
Repo | |
Framework | |
An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis
Title | An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis |
Authors | Kang-Ho Lee, JoonHyun Jeong, Sung-Ho Bae |
Abstract | Network compression for deep neural networks has become an important part of deep learning research, because of increased demand for deep learning models in practical resource-constrained environments. In this paper, we observe that the weights in adjacent convolution layers share strong similarity in shapes and values, i.e., the weights tend to vary smoothly along the layers. We call this phenomenon \textit{Smoothly Varying Weight Hypothesis} (SVWH). Based on SVWH and an inter-frame prediction method in conventional video coding schemes, we propose a new \textit{Inter-Layer Weight Prediction} (ILWP) and quantization method which quantize the predicted residuals of the weights. Since the predicted weight residuals tend to follow Laplacian distributions with very low variance, the weight quantization can more effectively be applied, thus producing more zero weights and enhancing weight compression ratio. In addition, we propose a new loss for eliminating non-texture bits, which enabled us to more effectively store only texture bits. That is, the proposed loss regularizes the weights such that the collocated weights between the adjacent two layers have the same values. Our comprehensive experiments show that the proposed method achieved much higher weight compression rate at the same accuracy level compared with the previous quantization-based compression methods in deep neural networks. |
Tasks | Quantization |
Published | 2019-07-16 |
URL | https://arxiv.org/abs/1907.06835v1 |
https://arxiv.org/pdf/1907.06835v1.pdf | |
PWC | https://paperswithcode.com/paper/an-inter-layer-weight-prediction-and |
Repo | |
Framework | |
Learning to grow: control of materials self-assembly using evolutionary reinforcement learning
Title | Learning to grow: control of materials self-assembly using evolutionary reinforcement learning |
Authors | Stephen Whitelam, Isaac Tamblyn |
Abstract | We show that neural networks trained by evolutionary reinforcement learning can enact efficient molecular self-assembly protocols. Presented with molecular simulation trajectories, networks learn to change temperature and chemical potential in order to promote the assembly of desired structures or choose between competing polymorphs. In the first case, networks reproduce in a qualitative sense the results of previously-known protocols, but faster and with higher fidelity; in the second case they identify strategies previously unknown, from which we can extract physical insight. Networks that take as input the elapsed time of the simulation or microscopic information from the system are both effective, the latter more so. The evolutionary scheme we have used is simple to implement and can be applied to a broad range of examples of experimental self-assembly, whether or not one can monitor the experiment as it proceeds. Our results have been achieved with no human input beyond the specification of which order parameter to promote, pointing the way to the design of synthesis protocols by artificial intelligence. |
Tasks | |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08333v2 |
https://arxiv.org/pdf/1912.08333v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-grow-control-of-materials-self |
Repo | |
Framework | |
POP-CNN: Predicting Odor’s Pleasantness with Convolutional Neural Network
Title | POP-CNN: Predicting Odor’s Pleasantness with Convolutional Neural Network |
Authors | Danli Wu, Yu Cheng, Dehan Luo, Kin-Yeung Wong, Kevin Hung, Zhijing Yang |
Abstract | Predicting odor’s pleasantness simplifies the evaluation of odors and has the potential to be applied in perfumes and environmental monitoring industry. Classical algorithms for predicting odor’s pleasantness generally use a manual feature extractor and an independent classifier. Manual designing a good feature extractor depend on expert knowledge and experience is the key to the accuracy of the algorithms. In order to circumvent this difficulty, we proposed a model for predicting odor’s pleasantness by using convolutional neural network. In our model, the convolutional neural layers replace manual feature extractor and show better performance. The experiments show that the correlation between our model and human is over 90% on pleasantness rating. And our model has 99.9% accuracy in distinguishing between absolutely pleasant or unpleasant odors. |
Tasks | |
Published | 2019-03-19 |
URL | http://arxiv.org/abs/1903.07821v1 |
http://arxiv.org/pdf/1903.07821v1.pdf | |
PWC | https://paperswithcode.com/paper/pop-cnn-predicting-odors-pleasantness-with |
Repo | |
Framework | |
Bayesian Optimization for Policy Search via Online-Offline Experimentation
Title | Bayesian Optimization for Policy Search via Online-Offline Experimentation |
Authors | Benjamin Letham, Eytan Bakshy |
Abstract | Online field experiments are the gold-standard way of evaluating changes to real-world interactive machine learning systems. Yet our ability to explore complex, multi-dimensional policy spaces - such as those found in recommendation and ranking problems - is often constrained by the limited number of experiments that can be run simultaneously. To alleviate these constraints, we augment online experiments with an offline simulator and apply multi-task Bayesian optimization to tune live machine learning systems. We describe practical issues that arise in these types of applications, including biases that arise from using a simulator and assumptions for the multi-task kernel. We measure empirical learning curves which show substantial gains from including data from biased offline experiments, and show how these learning curves are consistent with theoretical results for multi-task Gaussian process generalization. We find that improved kernel inference is a significant driver of multi-task generalization. Finally, we show several examples of Bayesian optimization efficiently tuning a live machine learning system by combining offline and online experiments. |
Tasks | |
Published | 2019-04-01 |
URL | http://arxiv.org/abs/1904.01049v2 |
http://arxiv.org/pdf/1904.01049v2.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-optimization-for-policy-search-via |
Repo | |
Framework | |
Incremental multi-domain learning with network latent tensor factorization
Title | Incremental multi-domain learning with network latent tensor factorization |
Authors | Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic |
Abstract | The prominence of deep learning, large amount of annotated data and increasingly powerful hardware made it possible to reach remarkable performance for supervised classification tasks, in many cases saturating the training sets. However the resulting models are specialized to a single very specific task and domain. Adapting the learned classification to new domains is a hard problem due to at least three reasons: (1) the new domains and the tasks might be drastically different; (2) there might be very limited amount of annotated data on the new domain and (3) full training of a new model for each new task is prohibitive in terms of computation and memory, due to the sheer number of parameters of deep CNNs. In this paper, we present a method to learn new-domains and tasks incrementally, building on prior knowledge from already learned tasks and without catastrophic forgetting. We do so by jointly parametrizing weights across layers using low-rank Tucker structure. The core is task agnostic while a set of task specific factors are learnt on each new domain. We show that leveraging tensor structure enables better performance than simply using matrix operations. Joint tensor modelling also naturally leverages correlations across different layers. Compared with previous methods which have focused on adapting each layer separately, our approach results in more compact representations for each new task/domain. We apply the proposed method to the 10 datasets of the Visual Decathlon Challenge and show that our method offers on average about 7.5x reduction in number of parameters and competitive performance in terms of both classification accuracy and Decathlon score. |
Tasks | |
Published | 2019-04-12 |
URL | https://arxiv.org/abs/1904.06345v2 |
https://arxiv.org/pdf/1904.06345v2.pdf | |
PWC | https://paperswithcode.com/paper/incremental-multi-domain-learning-with |
Repo | |
Framework | |
Provably efficient RL with Rich Observations via Latent State Decoding
Title | Provably efficient RL with Rich Observations via Latent State Decoding |
Authors | Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford |
Abstract | We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps—where previously decoded latent states provide labels for later regression problems—and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with na"ive exploration, even when $Q$-learning has cheating access to latent states. |
Tasks | Q-Learning |
Published | 2019-01-25 |
URL | https://arxiv.org/abs/1901.09018v2 |
https://arxiv.org/pdf/1901.09018v2.pdf | |
PWC | https://paperswithcode.com/paper/provably-efficient-rl-with-rich-observations |
Repo | |
Framework | |
Counterfactual States for Atari Agents via Generative Deep Learning
Title | Counterfactual States for Atari Agents via Generative Deep Learning |
Authors | Matthew L. Olson, Lawrence Neal, Fuxin Li, Weng-Keen Wong |
Abstract | Although deep reinforcement learning agents have produced impressive results in many domains, their decision making is difficult to explain to humans. To address this problem, past work has mainly focused on explaining why an action was chosen in a given state. A different type of explanation that is useful is a counterfactual, which deals with “what if?” scenarios. In this work, we introduce the concept of a counterfactual state to help humans gain a better understanding of what would need to change (minimally) in an Atari game image for the agent to choose a different action. We introduce a novel method to create counterfactual states from a generative deep learning architecture. In addition, we evaluate the effectiveness of counterfactual states on human participants who are not machine learning experts. Our user study results suggest that our generated counterfactual states are useful in helping non-expert participants gain a better understanding of an agent’s decision making process. |
Tasks | Decision Making |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1909.12969v1 |
https://arxiv.org/pdf/1909.12969v1.pdf | |
PWC | https://paperswithcode.com/paper/counterfactual-states-for-atari-agents-via |
Repo | |
Framework | |
Squeeze-and-Attention Networks for Semantic Segmentation
Title | Squeeze-and-Attention Networks for Semantic Segmentation |
Authors | Zilong Zhong, Zhong Qiu Lin, Rene Bidart, Xiaodan Hu, Ibrahim Ben Daya, Zhifeng Li, Wei-Shi Zheng, Jonathan Li, Alexander Wong |
Abstract | The recent integration of attention mechanisms into segmentation networks improves their representational capabilities through a great emphasis on more informative features. However, these attention mechanisms ignore an implicit sub-task of semantic segmentation and are constrained by the grid structure of convolution kernels. In this paper, we propose a novel squeeze-and-attention network (SANet) architecture that leverages an effective squeeze-and-attention (SA) module to account for two distinctive characteristics of segmentation: i) pixel-group attention, and ii) pixel-wise prediction. Specifically, the proposed SA modules impose pixel-group attention on conventional convolution by introducing an ‘attention’ convolutional channel, thus taking into account spatial-channel inter-dependencies in an efficient manner. The final segmentation results are produced by merging outputs from four hierarchical stages of a SANet to integrate multi-scale contexts for obtaining an enhanced pixel-wise prediction. Empirical experiments on two challenging public datasets validate the effectiveness of the proposed SANets, which achieves 83.2% mIoU (without COCO pre-training) on PASCAL VOC and a state-of-the-art mIoU of 54.4% on PASCAL Context. |
Tasks | Semantic Segmentation |
Published | 2019-09-08 |
URL | https://arxiv.org/abs/1909.03402v4 |
https://arxiv.org/pdf/1909.03402v4.pdf | |
PWC | https://paperswithcode.com/paper/squeeze-and-attention-networks-for-semantic |
Repo | |
Framework | |
SIRUS: Making Random Forests Interpretable
Title | SIRUS: Making Random Forests Interpretable |
Authors | Clément Bénard, Gérard Biau, Sébastien da Veiga, Erwan Scornet |
Abstract | State-of-the-art learning algorithms, such as random forests or neural networks, are often qualified as ‘‘black-boxes’’ because of the high number and complexity of operations involved in their prediction mechanism. This lack of interpretability is a strong limitation for applications involving critical decisions, typically the analysis of production processes in the manufacturing industry. In such critical contexts, models have to be interpretable, i.e., simple, stable, and predictive. To address this issue, we design SIRUS (Stable and Interpretable RUle Set), a new classification algorithm based on random forests, which takes the form of a short list of rules. While simple models are usually unstable with respect to data perturbation, SIRUS achieves a remarkable stability improvement over cutting-edge methods. Furthermore, SIRUS inherits a predictive accuracy close to random forests, combined with the simplicity of decision trees. These properties are assessed both from a theoretical and empirical point of view, through extensive numerical experiments based on our R/C++ software implementation sirus available from CRAN. |
Tasks | |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06852v3 |
https://arxiv.org/pdf/1908.06852v3.pdf | |
PWC | https://paperswithcode.com/paper/sirus-making-random-forests-interpretable |
Repo | |
Framework | |
Feature Learning Viewpoint of AdaBoost and a New Algorithm
Title | Feature Learning Viewpoint of AdaBoost and a New Algorithm |
Authors | Fei Wang, Zhongheng Li, Fang He, Rong Wang, Weizhong Yu, Feiping Nie |
Abstract | The AdaBoost algorithm has the superiority of resisting overfitting. Understanding the mysteries of this phenomena is a very fascinating fundamental theoretical problem. Many studies are devoted to explaining it from statistical view and margin theory. In this paper, we illustrate it from feature learning viewpoint, and propose the AdaBoost+SVM algorithm, which can explain the resistant to overfitting of AdaBoost directly and easily to understand. Firstly, we adopt the AdaBoost algorithm to learn the base classifiers. Then, instead of directly weighted combination the base classifiers, we regard them as features and input them to SVM classifier. With this, the new coefficient and bias can be obtained, which can be used to construct the final classifier. We explain the rationality of this and illustrate the theorem that when the dimension of these features increases, the performance of SVM would not be worse, which can explain the resistant to overfitting of AdaBoost. |
Tasks | |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.03953v1 |
http://arxiv.org/pdf/1904.03953v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-learning-viewpoint-of-adaboost-and-a |
Repo | |
Framework | |
Subspace Determination through Local Intrinsic Dimensional Decomposition: Theory and Experimentation
Title | Subspace Determination through Local Intrinsic Dimensional Decomposition: Theory and Experimentation |
Authors | Ruben Becker, Imane Hafnaoui, Michael E. Houle, Pan Li, Arthur Zimek |
Abstract | Axis-aligned subspace clustering generally entails searching through enormous numbers of subspaces (feature combinations) and evaluation of cluster quality within each subspace. In this paper, we tackle the problem of identifying subsets of features with the most significant contribution to the formation of the local neighborhood surrounding a given data point. For each point, the recently-proposed Local Intrinsic Dimension (LID) model is used in identifying the axis directions along which features have the greatest local discriminability, or equivalently, the fewest number of components of LID that capture the local complexity of the data. In this paper, we develop an estimator of LID along axis projections, and provide preliminary evidence that this LID decomposition can indicate axis-aligned data subspaces that support the formation of clusters. |
Tasks | |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06771v1 |
https://arxiv.org/pdf/1907.06771v1.pdf | |
PWC | https://paperswithcode.com/paper/subspace-determination-through-local |
Repo | |
Framework | |
The Unexpected Unexpected and the Expected Unexpected: How People’s Conception of the Unexpected is Not That Unexpected
Title | The Unexpected Unexpected and the Expected Unexpected: How People’s Conception of the Unexpected is Not That Unexpected |
Authors | Molly S Quinn, Kathleen Campbell, Mark T Keane |
Abstract | The answers people give when asked to ‘think of the unexpected’ for everyday event scenarios appear to be more expected than unexpected. There are expected unexpected outcomes that closely adhere to the given information in a scenario, based on familiar disruptions and common plan-failures. There are also unexpected unexpected outcomes that are more inventive, that depart from given information, adding new concepts/actions. However, people seem to tend to conceive of the unexpected as the former more than the latter. Study 1 tests these proposals by analysing the object-concepts people mention in their reports of the unexpected and the agreement between their answers. Study 2 shows that object-choices are weakly influenced by recency, the order of sentences in the scenario. The implications of these results for ideas in philosophy, psychology and computing is discussed |
Tasks | |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.08063v1 |
https://arxiv.org/pdf/1905.08063v1.pdf | |
PWC | https://paperswithcode.com/paper/the-unexpected-unexpected-and-the-expected |
Repo | |
Framework | |
Unsupervised Synthesis of Anomalies in Videos: Transforming the Normal
Title | Unsupervised Synthesis of Anomalies in Videos: Transforming the Normal |
Authors | Abhishek Joshi, Vinay P. Namboodiri |
Abstract | Abnormal activity recognition requires detection of occurrence of anomalous events that suffer from a severe imbalance in data. In a video, normal is used to describe activities that conform to usual events while the irregular events which do not conform to the normal are referred to as abnormal. It is far more common to observe normal data than to obtain abnormal data in visual surveillance. In this paper, we propose an approach where we can obtain abnormal data by transforming normal data. This is a challenging task that is solved through a multi-stage pipeline approach. We utilize a number of techniques from unsupervised segmentation in order to synthesize new samples of data that are transformed from an existing set of normal examples. Further, this synthesis approach has useful applications as a data augmentation technique. An incrementally trained Bayesian convolutional neural network (CNN) is used to carefully select the set of abnormal samples that can be added. Finally through this synthesis approach we obtain a comparable set of abnormal samples that can be used for training the CNN for the classification of normal vs abnormal samples. We show that this method generalizes to multiple settings by evaluating it on two real world datasets and achieves improved performance over other probabilistic techniques that have been used in the past for this task. |
Tasks | Activity Recognition, Data Augmentation |
Published | 2019-04-14 |
URL | http://arxiv.org/abs/1904.06633v1 |
http://arxiv.org/pdf/1904.06633v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-synthesis-of-anomalies-in-videos |
Repo | |
Framework | |