Paper Group ANR 898
Batch-Shaping for Learning Conditional Channel Gated Networks. A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs. Unsupervised Learning through Temporal Smoothing and Entropy Maximization. Distributional reinforcement learning with linear function approximation. Nonstationary Multivariate Gaussian Processes for Electron …
Batch-Shaping for Learning Conditional Channel Gated Networks
Title | Batch-Shaping for Learning Conditional Channel Gated Networks |
Authors | Babak Ehteshami Bejnordi, Tijmen Blankevoort, Max Welling |
Abstract | We present a method that trains large capacity neural networks with significantly improved accuracy and lower dynamic computational cost. We achieve this by gating the deep-learning architecture on a fine-grained-level. Individual convolutional maps are turned on/off conditionally on features in the network. To achieve this, we introduce a new residual block architecture that gates convolutional channels in a fine-grained manner. We also introduce a generally applicable tool $batch$-$shaping$ that matches the marginal aggregate posteriors of features in a neural network to a pre-specified prior distribution. We use this novel technique to force gates to be more conditional on the data. We present results on CIFAR-10 and ImageNet datasets for image classification, and Cityscapes for semantic segmentation. Our results show that our method can slim down large architectures conditionally, such that the average computational cost on the data is on par with a smaller architecture, but with higher accuracy. In particular, on ImageNet, our ResNet50 and ResNet34 gated networks obtain 74.60% and 72.55% top-1 accuracy compared to the 69.76% accuracy of the baseline ResNet18 model, for similar complexity. We also show that the resulting networks automatically learn to use more features for difficult examples and fewer features for simple examples. |
Tasks | Image Classification, Semantic Segmentation |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06627v3 |
https://arxiv.org/pdf/1907.06627v3.pdf | |
PWC | https://paperswithcode.com/paper/batch-shaped-channel-gated-networks |
Repo | |
Framework | |
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Title | A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs |
Authors | Koyel Mukherjee, Alind Khare, Ashish Verma |
Abstract | Training neural networks on image datasets generally require extensive experimentation to find the optimal learning rate regime. Especially, for the cases of adversarial training or for training a newly synthesized model, one would not know the best learning rate regime beforehand. We propose an automated algorithm for determining the learning rate trajectory, that works across datasets and models for both natural and adversarial training, without requiring any dataset/model specific tuning. It is a stand-alone, parameterless, adaptive approach with no computational overhead. We theoretically discuss the algorithm’s convergence behavior. We empirically validate our algorithm extensively. Our results show that our proposed approach \emph{consistently} achieves top-level accuracy compared to SOTA baselines in the literature in natural as well as adversarial training. |
Tasks | |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11605v1 |
https://arxiv.org/pdf/1910.11605v1.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-dynamic-learning-rate-tuning-1 |
Repo | |
Framework | |
Unsupervised Learning through Temporal Smoothing and Entropy Maximization
Title | Unsupervised Learning through Temporal Smoothing and Entropy Maximization |
Authors | Per Rutquist |
Abstract | This paper proposes a method for machine learning from unlabeled data in the form of a time-series. The mapping that is learned is shown to extract slowly evolving information that would be useful for control applications, while efficiently filtering out unwanted, higher-frequency noise. The method consists of training a feedforward artificial neural network with backpropagation using two opposing objectives. The first of these is to minimize the squared changes in activations between time steps of each unit in the network. This “temporal smoothing” has the effect of correlating inputs that occur close in time with outputs that are close in the L2-norm. The second objective is to maximize the log determinant of the covariance matrix of activations in each layer of the network. This objective ensures that information from each layer is passed through to the next. This second objective acts as a balance to the first, which on its own would result in a network with all input weights equal to zero. |
Tasks | Time Series |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.03100v1 |
https://arxiv.org/pdf/1905.03100v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-through-temporal |
Repo | |
Framework | |
Distributional reinforcement learning with linear function approximation
Title | Distributional reinforcement learning with linear function approximation |
Authors | Marc G. Bellemare, Nicolas Le Roux, Pablo Samuel Castro, Subhodeep Moitra |
Abstract | Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)‘s analysis of the C51 algorithm in terms of the Cram'er distance, but their results only apply to the tabular setting and ignore C51’s use of a softmax to produce normalized distributions. In this paper we adapt the Cram'er distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cram'er-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model’s prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cram'er-based distributional methods may perform worse than directly approximating the value function. |
Tasks | Distributional Reinforcement Learning |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.03149v1 |
http://arxiv.org/pdf/1902.03149v1.pdf | |
PWC | https://paperswithcode.com/paper/distributional-reinforcement-learning-with |
Repo | |
Framework | |
Nonstationary Multivariate Gaussian Processes for Electronic Health Records
Title | Nonstationary Multivariate Gaussian Processes for Electronic Health Records |
Authors | Rui Meng, Braden Soper, Herbert Lee, Vincent X. Liu, John D. Greene, Priyadip Ray |
Abstract | We propose multivariate nonstationary Gaussian processes for jointly modeling multiple clinical variables, where the key parameters, length-scales, standard deviations and the correlations between the observed output, are all time dependent. We perform posterior inference via Hamiltonian Monte Carlo (HMC). We also provide methods for obtaining computationally efficient gradient-based maximum a posteriori (MAP) estimates. We validate our model on synthetic data as well as on electronic health records (EHR) data from Kaiser Permanente (KP). We show that the proposed model provides better predictive performance over a stationary model as well as uncovers interesting latent correlation processes across vitals which are potentially predictive of patient risk. |
Tasks | Gaussian Processes |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.05851v1 |
https://arxiv.org/pdf/1910.05851v1.pdf | |
PWC | https://paperswithcode.com/paper/nonstationary-multivariate-gaussian-processes |
Repo | |
Framework | |
MmWave Radar Point Cloud Segmentation using GMM in Multimodal Traffic Monitoring
Title | MmWave Radar Point Cloud Segmentation using GMM in Multimodal Traffic Monitoring |
Authors | Feng Jin, Arindam Sengupta, Siyang Cao, Yao-Jan Wu |
Abstract | In multimodal traffic monitoring, we gather traffic statistics for distinct transportation modes, such as pedestrians, cars and bicycles, in order to analyze and improve people’s daily mobility in terms of safety and convenience. On account of its robustness to bad light and adverse weather conditions, and inherent speed measurement ability, the radar sensor is a suitable option for this application. However, the sparse radar data from conventional commercial radars make it extremely challenging for transportation mode classification. Thus, we propose to use a high-resolution millimeter-wave(mmWave) radar sensor to obtain a relatively richer radar point cloud representation for a traffic monitoring scenario. Based on a new feature vector, we use the multivariate Gaussian mixture model (GMM) to do the radar point cloud segmentation, i.e. `point-wise’ classification, in an unsupervised learning environment. In our experiment, we collected radar point clouds for pedestrians and cars, which also contained the inevitable clutter from the surroundings. The experimental results using GMM on the new feature vector demonstrated a good segmentation performance in terms of the intersection-over-union (IoU) metrics. The detailed methodology and validation metrics are presented and discussed. | |
Tasks | |
Published | 2019-11-14 |
URL | https://arxiv.org/abs/1911.06364v3 |
https://arxiv.org/pdf/1911.06364v3.pdf | |
PWC | https://paperswithcode.com/paper/mmwave-radar-point-cloud-segmentation-using |
Repo | |
Framework | |
Multivariate Convolutional Sparse Coding with Low Rank Tensor
Title | Multivariate Convolutional Sparse Coding with Low Rank Tensor |
Authors | Pierre Humbert, Julien Audiffren, Laurent Oudre, Nicolas Vayatis |
Abstract | This paper introduces a new multivariate convolutional sparse coding based on tensor algebra with a general model enforcing both element-wise sparsity and low-rankness of the activations tensors. By using the CP decomposition, this model achieves a significantly more efficient encoding of the multivariate signal-particularly in the high order/ dimension setting-resulting in better performance. We prove that our model is closely related to the Kruskal tensor regression problem, offering interesting theoretical guarantees to our setting. Furthermore, we provide an efficient optimization algorithm based on alternating optimization to solve this model. Finally, we evaluate our algorithm with a large range of experiments, highlighting its advantages and limitations. |
Tasks | |
Published | 2019-08-09 |
URL | https://arxiv.org/abs/1908.03367v1 |
https://arxiv.org/pdf/1908.03367v1.pdf | |
PWC | https://paperswithcode.com/paper/multivariate-convolutional-sparse-coding-with |
Repo | |
Framework | |
On the expected behaviour of noise regularised deep neural networks as Gaussian processes
Title | On the expected behaviour of noise regularised deep neural networks as Gaussian processes |
Authors | Arnu Pretorius, Herman Kamper, Steve Kroon |
Abstract | Recent work has established the equivalence between deep neural networks and Gaussian processes (GPs), resulting in so-called neural network Gaussian processes (NNGPs). The behaviour of these models depends on the initialisation of the corresponding network. In this work, we consider the impact of noise regularisation (e.g. dropout) on NNGPs, and relate their behaviour to signal propagation theory in noise regularised deep neural networks. For ReLU activations, we find that the best performing NNGPs have kernel parameters that correspond to a recently proposed initialisation scheme for noise regularised ReLU networks. In addition, we show how the noise influences the covariance matrix of the NNGP, producing a stronger prior towards simple functions away from the training points. We verify our theoretical findings with experiments on MNIST and CIFAR-10 as well as on synthetic data. |
Tasks | Gaussian Processes |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.05563v1 |
https://arxiv.org/pdf/1910.05563v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-expected-behaviour-of-noise |
Repo | |
Framework | |
Deep Structured Mixtures of Gaussian Processes
Title | Deep Structured Mixtures of Gaussian Processes |
Authors | Martin Trapp, Robert Peharz, Franz Pernkopf, Carl E. Rasmussen |
Abstract | Gaussian Processes (GPs) are powerful non-parametric Bayesian regression models that allow exact posterior inference, but exhibit high computational and memory costs. In order to improve scalability of GPs, approximate posterior inference is frequently employed, where a prominent class of approximation techniques is based on local GP experts. However, the local-expert techniques proposed so far are either not well-principled, come with limited approximation guarantees, or lead to intractable models. In this paper, we introduce deep structured mixtures of GP experts, a stochastic process model which i) allows exact posterior inference, ii) has attractive computational and memory costs, and iii), when used as GP approximation, captures predictive uncertainties consistently better than previous approximations. In a variety of experiments, we show that deep structured mixtures have a low approximation error and outperform existing expert-based approaches. |
Tasks | Gaussian Processes |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04536v1 |
https://arxiv.org/pdf/1910.04536v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-structured-mixtures-of-gaussian |
Repo | |
Framework | |
High Dimensional Robust $M$-Estimation: Arbitrary Corruption and Heavy Tails
Title | High Dimensional Robust $M$-Estimation: Arbitrary Corruption and Heavy Tails |
Authors | Liu Liu, Tianyang Li, Constantine Caramanis |
Abstract | We consider the problem of sparsity-constrained $M$-estimation when both explanatory and response variables have heavy tails (bounded 4-th moments), or a fraction of arbitrary corruptions. We focus on the $k$-sparse, high-dimensional regime where the number of variables $d$ and the sample size $n$ are related through $n \sim k \log d$. We define a natural condition we call the Robust Descent Condition (RDC), and show that if a gradient estimator satisfies the RDC, then Robust Hard Thresholding (IHT using this gradient estimator), is guaranteed to obtain good statistical rates. The contribution of this paper is in showing that this RDC is a flexible enough concept to recover known results, and obtain new robustness results. Specifically, new results include: (a) For $k$-sparse high-dimensional linear- and logistic-regression with heavy tail (bounded 4-th moment) explanatory and response variables, a linear-time-computable median-of-means gradient estimator satisfies the RDC, and hence Robust Hard Thresholding is minimax optimal; (b) When instead of heavy tails we have $O(1/\sqrt{k}\log(nd))$-fraction of arbitrary corruptions in explanatory and response variables, a near linear-time computable trimmed gradient estimator satisfies the RDC, and hence Robust Hard Thresholding is minimax optimal. We demonstrate the effectiveness of our approach in sparse linear, logistic regression, and sparse precision matrix estimation on synthetic and real-world US equities data. |
Tasks | |
Published | 2019-01-24 |
URL | https://arxiv.org/abs/1901.08237v2 |
https://arxiv.org/pdf/1901.08237v2.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-robust-estimation-of-sparse |
Repo | |
Framework | |
Compiling Arguments in an Argumentation Framework into Three-valued Logical Expressions
Title | Compiling Arguments in an Argumentation Framework into Three-valued Logical Expressions |
Authors | Sosuke Moriguchi, Kazuko Takahashi |
Abstract | In this paper, we propose a new method for computing general allocators directly from completeness conditions. A general allocator is an abstraction of all complete labelings for an argumentation framework. Any complete labeling is obtained from a general allocator by assigning logical constants to variables. We proved the existence of the general allocators in our previous work. However, the construction requires us to enumerate all complete labelings for the framework, which makes the computation prohibitively slow. The method proposed in this paper enables us to compute general allocators without enumerating complete labelings. It also provides the solutions of local allocation that yield semantics for subsets of the framework. We demonstrate two applications of general allocators, stability, and a new concept for frameworks, termed arity. Moreover, the method, including local allocation, is applicable to broad extensions of frameworks, such as argumentation frameworks with set-attacks, bipolar argumentation frameworks, and abstract dialectical frameworks. |
Tasks | |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01185v1 |
https://arxiv.org/pdf/1911.01185v1.pdf | |
PWC | https://paperswithcode.com/paper/compiling-arguments-in-an-argumentation |
Repo | |
Framework | |
Towards moderate overparameterization: global convergence guarantees for training shallow neural networks
Title | Towards moderate overparameterization: global convergence guarantees for training shallow neural networks |
Authors | Samet Oymak, Mahdi Soltanolkotabi |
Abstract | Many modern neural network architectures are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Sufficiently overparameterized neural network architectures in principle have the capacity to fit any set of labels including random noise. However, given the highly nonconvex nature of the training landscape it is not clear what level and kind of overparameterization is required for first order methods to converge to a global optima that perfectly interpolate any labels. A number of recent theoretical works have shown that for very wide neural networks where the number of hidden units is polynomially large in the size of the training data gradient descent starting from a random initialization does indeed converge to a global optima. However, in practice much more moderate levels of overparameterization seems to be sufficient and in many cases overparameterized models seem to perfectly interpolate the training data as soon as the number of parameters exceed the size of the training data by a constant factor. Thus there is a huge gap between the existing theoretical literature and practical experiments. In this paper we take a step towards closing this gap. Focusing on shallow neural nets and smooth activations, we show that (stochastic) gradient descent when initialized at random converges at a geometric rate to a nearby global optima as soon as the square-root of the number of network parameters exceeds the size of the training data. Our results also benefit from a fast convergence rate and continue to hold for non-differentiable activations such as Rectified Linear Units (ReLUs). |
Tasks | |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04674v1 |
http://arxiv.org/pdf/1902.04674v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-moderate-overparameterization-global |
Repo | |
Framework | |
Exploration and Coordination of Complementary Multi-Robot Teams In a Hunter and Gatherer Scenario
Title | Exploration and Coordination of Complementary Multi-Robot Teams In a Hunter and Gatherer Scenario |
Authors | Mehdi Dadvar, Saeed Moazami, Harley R. Myler, Hassan Zargarzadeh |
Abstract | This paper considers the problem of dynamic task allocation, where tasks are unknowingly distributed over an environment. We aim to address the multi-robot exploration aspect of the problem while solving the task-allocation aspect. To that end, we first propose a novel nature-inspired approach called “hunter and gatherer”. We consider each task comprised of two sequential subtasks: detection and completion, where each subtask can only be carried out by a certain type of agent. Thus, this approach employs two complementary teams of agents: one agile in detecting (hunters) and another dexterous in completing (gatherers) the tasks. Then, we propose a multi-robot exploration algorithm for hunters and a multi-robot task allocation algorithm for gatherers, both in a distributed manner and based on innovative notions of “certainty and uncertainty profit margins”. Statistical analysis of simulation results confirms the efficacy of the proposed algorithms. Besides, it is statistically proven that the proposed solutions function fairly, i.e. for each type of agent, the overall workload is distributed equally. |
Tasks | |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.07521v1 |
https://arxiv.org/pdf/1912.07521v1.pdf | |
PWC | https://paperswithcode.com/paper/exploration-and-coordination-of-complementary |
Repo | |
Framework | |
Effective Domain Knowledge Transfer with Soft Fine-tuning
Title | Effective Domain Knowledge Transfer with Soft Fine-tuning |
Authors | Zhichen Zhao, Bowen Zhang, Yuning Jiang, Li Xu, Lei Li, Wei-Ying Ma |
Abstract | Convolutional neural networks require numerous data for training. Considering the difficulties in data collection and labeling in some specific tasks, existing approaches generally use models pre-trained on a large source domain (e.g. ImageNet), and then fine-tune them on these tasks. However, the datasets from source domain are simply discarded in the fine-tuning process. We argue that the source datasets could be better utilized and benefit fine-tuning. This paper firstly introduces the concept of general discrimination to describe ability of a network to distinguish untrained patterns, and then experimentally demonstrates that general discrimination could potentially enhance the total discrimination ability on target domain. Furthermore, we propose a novel and light-weighted method, namely soft fine-tuning. Unlike traditional fine-tuning which directly replaces optimization objective by a loss function on the target domain, soft fine-tuning effectively keeps general discrimination by holding the previous loss and removes it softly. By doing so, soft fine-tuning improves the robustness of the network to data bias, and meanwhile accelerates the convergence. We evaluate our approach on several visual recognition tasks. Extensive experimental results support that soft fine-tuning provides consistent improvement on all evaluated tasks, and outperforms the state-of-the-art significantly. Codes will be made available to the public. |
Tasks | Transfer Learning |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02236v1 |
https://arxiv.org/pdf/1909.02236v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-domain-knowledge-transfer-with-soft |
Repo | |
Framework | |
Iterated Belief Base Revision: A Dynamic Epistemic Logic Approach
Title | Iterated Belief Base Revision: A Dynamic Epistemic Logic Approach |
Authors | Marlo Souza, Álvaro Moreira, Renata Vieira |
Abstract | AGM’s belief revision is one of the main paradigms in the study of belief change operations. In this context, belief bases (prioritised bases) have been largely used to specify the agent’s belief state - whether representing the agent’s `explicit beliefs’ or as a computational model for her belief state. While the connection of iterated AGM-like operations and their encoding in dynamic epistemic logics have been studied before, few works considered how well-known postulates from iterated belief revision theory can be characterised by means of belief bases and their counterpart in a dynamic epistemic logic. This work investigates how priority graphs, a syntactic representation of preference relations deeply connected to prioritised bases, can be used to characterise belief change operators, focusing on well-known postulates of Iterated Belief Change. We provide syntactic representations of belief change operators in a dynamic context, as well as new negative results regarding the possibility of representing an iterated belief revision operation using transformations on priority graphs. | |
Tasks | |
Published | 2019-02-17 |
URL | http://arxiv.org/abs/1902.06178v1 |
http://arxiv.org/pdf/1902.06178v1.pdf | |
PWC | https://paperswithcode.com/paper/iterated-belief-base-revision-a-dynamic |
Repo | |
Framework | |