Paper Group ANR 785
Learn-able parameter guided Activation Functions. An ADMM Based Framework for AutoML Pipeline Configuration. On Symmetry and Initialization for Neural Networks. Novel tracking approach based on fully-unsupervised disentanglement of the geometrical factors of variation. Benchmark and Survey of Automated Machine Learning Frameworks. Predicting e-comm …
Learn-able parameter guided Activation Functions
Title | Learn-able parameter guided Activation Functions |
Authors | S. Balaji, T. Kavya, Natasha Sebastian |
Abstract | In this paper, we explore the concept of adding learn-able slope and mean shift parameters to an activation function to improve the total response region. The characteristics of an activation function depend highly on the value of parameters. Making the parameters learn-able, makes the activation function more dynamic and capable to adapt as per the requirements of its neighboring layers. The introduced slope parameter is independent of other parameters in the activation function. The concept was applied to ReLU to develop Dual Line and DualParametric ReLU activation function. Evaluation on MNIST and CIFAR10 show that the proposed activation function Dual Line achieves top-5 position for mean accuracy among 43 activation functions tested with LENET4, LENET5, and WideResNet architectures. This is the first time more than 40 activation functions were analyzed on MNIST andCIFAR10 dataset at the same time. The study on the distribution of positive slope parameter beta indicates that the activation function adapts as per the requirements of the neighboring layers. The study shows that model performance increases with the proposed activation functions |
Tasks | |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.10752v1 |
https://arxiv.org/pdf/1912.10752v1.pdf | |
PWC | https://paperswithcode.com/paper/learn-able-parameter-guided-activation |
Repo | |
Framework | |
An ADMM Based Framework for AutoML Pipeline Configuration
Title | An ADMM Based Framework for AutoML Pipeline Configuration |
Authors | Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, Dakuo Wang, Andrew Conn, Alexander Gray |
Abstract | We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised learning pipelines. This black-box (gradient-free) optimization with mixed integer & continuous variables is a challenging problem. We propose a novel AutoML scheme by leveraging the alternating direction method of multipliers (ADMM). The proposed framework is able to (i) decompose the optimization problem into easier sub-problems that have a reduced number of variables and circumvent the challenge of mixed variable categories, and (ii) incorporate black-box constraints along-side the black-box optimization objective. We empirically evaluate the flexibility (in utilizing existing AutoML techniques), effectiveness (against open source AutoML toolkits),and unique capability (of executing AutoML with practically motivated black-box constraints) of our proposed scheme on a collection of binary classification data sets from UCI ML& OpenML repositories. We observe that on an average our framework provides significant gains in comparison to other AutoML frameworks (Auto-sklearn & TPOT), highlighting the practical advantages of this framework. |
Tasks | AutoML |
Published | 2019-05-01 |
URL | https://arxiv.org/abs/1905.00424v5 |
https://arxiv.org/pdf/1905.00424v5.pdf | |
PWC | https://paperswithcode.com/paper/automated-machine-learning-via-admm |
Repo | |
Framework | |
On Symmetry and Initialization for Neural Networks
Title | On Symmetry and Initialization for Neural Networks |
Authors | Ido Nachum, Amir Yehudayoff |
Abstract | This work provides an additional step in the theoretical understanding of neural networks. We consider neural networks with one hidden layer and show that when learning symmetric functions, one can choose initial conditions so that standard SGD training efficiently produces generalization guarantees. We empirically verify this and show that this does not hold when the initial conditions are chosen at random. The proof of convergence investigates the interaction between the two layers of the network. Our results highlight the importance of using symmetry in the design of neural networks. |
Tasks | |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00560v1 |
https://arxiv.org/pdf/1907.00560v1.pdf | |
PWC | https://paperswithcode.com/paper/on-symmetry-and-initialization-for-neural |
Repo | |
Framework | |
Novel tracking approach based on fully-unsupervised disentanglement of the geometrical factors of variation
Title | Novel tracking approach based on fully-unsupervised disentanglement of the geometrical factors of variation |
Authors | Mykhailo Vladymyrov, Akitaka Ariga |
Abstract | Efficient tracking algorithms are a crucial part of particle tracking detectors. While a lot of work has been done in designing a plethora of algorithms, these usually require tedious tuning for each use case. (Weakly) supervised Machine Learning-based approaches can leverage the actual raw data for maximal performance. Yet in realistic scenarios, sufficient high-quality labeled data is not available. While training might be performed on simulated data, the reproduction of realistic signal and noise in the detector requires substantial effort, compromising this approach. Here we propose a novel, fully unsupervised, approach to track reconstruction. The introduced model for learning to disentangle the factors of variation in a geometrically meaningful way employs geometrical space invariances. We train it through constraints on the equivariance between the image space and the latent representation in a Deep Convolutional Autoencoder. Using experimental results on synthetic data we show that a combination of different space transformations is required for meaningful disentanglement of factors of variation. We also demonstrate the performance of our model on real data from tracking detectors. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04427v2 |
https://arxiv.org/pdf/1909.04427v2.pdf | |
PWC | https://paperswithcode.com/paper/novel-tracking-approach-based-on-fully |
Repo | |
Framework | |
Benchmark and Survey of Automated Machine Learning Frameworks
Title | Benchmark and Survey of Automated Machine Learning Frameworks |
Authors | Marc-André Zöller, Marco F. Huber |
Abstract | Machine learning (ML) has become a vital part in many aspects of our daily life. However, building well performing machine learning applications requires highly specialized data scientists and domain experts. Automated machine learning (AutoML) aims to reduce the demand for data scientists by enabling domain experts to automatically build machine learning applications without extensive knowledge of statistics and machine learning. This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets. Driven by the selected frameworks for evaluation, we summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline. The selected AutoML frameworks are evaluated on 137 different data sets. |
Tasks | AutoML |
Published | 2019-04-26 |
URL | https://arxiv.org/abs/1904.12054v2 |
https://arxiv.org/pdf/1904.12054v2.pdf | |
PWC | https://paperswithcode.com/paper/survey-on-automated-machine-learning |
Repo | |
Framework | |
Predicting e-commerce customer conversion from minimal temporal patterns on symbolized clickstream trajectories
Title | Predicting e-commerce customer conversion from minimal temporal patterns on symbolized clickstream trajectories |
Authors | Jacopo Tagliabue, Lucas Lacasa, Ciro Greco, Mattia Pavoni, Andrea Polonioli |
Abstract | Knowing if a user is a buyer or window shopper solely based on clickstream data is of crucial importance for e-commerce platforms seeking to implement real-time accurate NBA (next best action) policies. However, due to the low frequency of conversion events and the noisiness of browsing data, classifying user sessions is very challenging. In this paper, we address the clickstream classification problem in the eCommerce industry and present three major contributions to the burgeoning field of AI-for-retail: first, we collected, normalized and prepared a novel dataset of live shopping sessions from a major European e-commerce website; second, we use the dataset to test in a controlled environment strong baselines and SOTA models from the literature; finally, we propose a new discriminative neural model that outperforms neural architectures recently proposed at Rakuten labs. |
Tasks | |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.02797v2 |
https://arxiv.org/pdf/1907.02797v2.pdf | |
PWC | https://paperswithcode.com/paper/predicting-e-commerce-customer-conversion |
Repo | |
Framework | |
Deep-Learning-Based Aerial Image Classification for Emergency Response Applications Using Unmanned Aerial Vehicles
Title | Deep-Learning-Based Aerial Image Classification for Emergency Response Applications Using Unmanned Aerial Vehicles |
Authors | Christos Kyrkou, Theocharis Theocharides |
Abstract | Unmanned Aerial Vehicles (UAVs), equipped with camera sensors can facilitate enhanced situational awareness for many emergency response and disaster management applications since they are capable of operating in remote and difficult to access areas. In addition, by utilizing an embedded platform and deep learning UAVs can autonomously monitor a disaster stricken area, analyze the image in real-time and alert in the presence of various calamities such as collapsed buildings, flood, or fire in order to faster mitigate their effects on the environment and on human population. To this end, this paper focuses on the automated aerial scene classification of disaster events from on-board a UAV. Specifically, a dedicated Aerial Image Database for Emergency Response (AIDER) applications is introduced and a comparative analysis of existing approaches is performed. Through this analysis a lightweight convolutional neural network (CNN) architecture is developed, capable of running efficiently on an embedded platform achieving ~3x higher performance compared to existing models with minimal memory requirements with less than 2% accuracy drop compared to the state-of-the-art. These preliminary results provide a solid basis for further experimentation towards real-time aerial image classification for emergency response applications using UAVs. |
Tasks | Image Classification, Scene Classification |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08716v1 |
https://arxiv.org/pdf/1906.08716v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-aerial-image |
Repo | |
Framework | |
Online Robustness Training for Deep Reinforcement Learning
Title | Online Robustness Training for Deep Reinforcement Learning |
Authors | Marc Fischer, Matthew Mirman, Steven Stalder, Martin Vechev |
Abstract | In deep reinforcement learning (RL), adversarial attacks can trick an agent into unwanted states and disrupt training. We propose a system called Robust Student-DQN (RS-DQN), which permits online robustness training alongside Q networks, while preserving competitive performance. We show that RS-DQN can be combined with (i) state-of-the-art adversarial training and (ii) provably robust training to obtain an agent that is resilient to strong attacks during training and evaluation. |
Tasks | |
Published | 2019-11-03 |
URL | https://arxiv.org/abs/1911.00887v3 |
https://arxiv.org/pdf/1911.00887v3.pdf | |
PWC | https://paperswithcode.com/paper/online-robustness-training-for-deep |
Repo | |
Framework | |
GLA in MediaEval 2018 Emotional Impact of Movies Task
Title | GLA in MediaEval 2018 Emotional Impact of Movies Task |
Authors | Jennifer J. Sun, Ting Liu, Gautam Prasad |
Abstract | The visual and audio information from movies can evoke a variety of emotions in viewers. Towards a better understanding of viewer impact, we present our methods for the MediaEval 2018 Emotional Impact of Movies Task to predict the expected valence and arousal continuously in movies. This task, using the LIRIS-ACCEDE dataset, enables researchers to compare different approaches for predicting viewer impact from movies. Our approach leverages image, audio, and face based features computed using pre-trained neural networks. These features were computed over time and modeled using a gated recurrent unit (GRU) based network followed by a mixture of experts model to compute multiclass predictions. We smoothed these predictions using a Butterworth filter for our final result. Our method enabled us to achieve top performance in three evaluation metrics in the MediaEval 2018 task. |
Tasks | |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12361v1 |
https://arxiv.org/pdf/1911.12361v1.pdf | |
PWC | https://paperswithcode.com/paper/gla-in-mediaeval-2018-emotional-impact-of |
Repo | |
Framework | |
Angular Visual Hardness
Title | Angular Visual Hardness |
Authors | Beidi Chen, Weiyang Liu, Animesh Garg, Zhiding Yu, Anshumali Shrivastava, Jan Kautz, Anima Anandkumar |
Abstract | Recent convolutional neural networks (CNNs) have led to impressive performance but often suffer from poor calibration. They tend to be overconfident, with the model confidence not always reflecting the underlying true ambiguity and hardness. In this paper, we propose angular visual hardness (AVH), a score given by the normalized angular distance between the sample feature embedding and the target classifier to measure sample hardness. We validate this score with an in-depth and extensive scientific study, and observe that CNN models with the highest accuracy also have the best AVH scores. This agrees with an earlier finding that state-of-art models improve on the classification of harder examples. We observe that the training dynamics of AVH is vastly different compared to the training loss. Specifically, AVH quickly reaches a plateau for all samples even though the training loss keeps improving. This suggests the need for designing better loss functions that can target harder examples more effectively. We also find that AVH has a statistically significant correlation with human visual hardness. Finally, we demonstrate the benefit of AVH to a variety ofcations such as self-training for domain adaptation and domain generalization. |
Tasks | Calibration, Domain Adaptation, Domain Generalization |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.02279v3 |
https://arxiv.org/pdf/1912.02279v3.pdf | |
PWC | https://paperswithcode.com/paper/angular-visual-hardness-1 |
Repo | |
Framework | |
Language Model Adaptation for Language and Dialect Identification of Text
Title | Language Model Adaptation for Language and Dialect Identification of Text |
Authors | Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen |
Abstract | This article describes an unsupervised language model adaptation approach that can be used to enhance the performance of language identification methods. The approach is applied to a current version of the HeLI language identification method, which is now called HeLI 2.0. We describe the HeLI 2.0 method in detail. The resulting system is evaluated using the datasets from the German dialect identification and Indo-Aryan language identification shared tasks of the VarDial workshops 2017 and 2018. The new approach with language identification provides considerably higher F1-scores than the previous HeLI method or the other systems which participated in the shared tasks. The results indicate that unsupervised language model adaptation should be considered as an option in all language identification tasks, especially in those where encountering out-of-domain data is likely. |
Tasks | Language Identification, Language Modelling |
Published | 2019-03-26 |
URL | http://arxiv.org/abs/1903.10915v1 |
http://arxiv.org/pdf/1903.10915v1.pdf | |
PWC | https://paperswithcode.com/paper/language-model-adaptation-for-language-and |
Repo | |
Framework | |
An Exploration of State-of-the-art Methods for Offensive Language Detection
Title | An Exploration of State-of-the-art Methods for Offensive Language Detection |
Authors | Harrison Uglow, Martin Zlocha, Szymon Zmyślony |
Abstract | We provide a comprehensive investigation of different custom and off-the-shelf architectures as well as different approaches to generating feature vectors for offensive language detection. We also show that these approaches work well on small and noisy datasets such as on the Offensive Language Identification Dataset (OLID), so it should be possible to use them for other applications. |
Tasks | Language Identification |
Published | 2019-03-15 |
URL | http://arxiv.org/abs/1903.07445v2 |
http://arxiv.org/pdf/1903.07445v2.pdf | |
PWC | https://paperswithcode.com/paper/semeval-2019-task-6-an-exploration-of-state |
Repo | |
Framework | |
DCASE 2019: CNN depth analysis with different channel inputs for Acoustic Scene Classification
Title | DCASE 2019: CNN depth analysis with different channel inputs for Acoustic Scene Classification |
Authors | Javier Naranjo-Alcazar, Sergi Perez-Castanos, Pedro Zuccarello, Maximo Cobos |
Abstract | The objective of this technical report is to describe the framework used in Task 1, Acoustic scene classification (ASC), of the DCASE 2019 challenge. The presented approach is based on Log-Mel spectrogram representations and VGG-based Convolutional Neural Networks (CNNs). Three different CNNs, with very similar architectures, have been implemented. Themain difference is the number of filters in their convolutional blocks. Experiments show that the depth of the network is not the most relevant factor for improving the accuracy of the results.The performance seems to be more sensitive to the input audio representation. This conclusion is important for the implementation of real-time audio recognition and classification systemon edge devices. In the presented experiments the best audio representation is the Log-Mel spectrogram of the harmonic andpercussive sources plus the Log-Mel spectrogram of the difference between left and right stereo-channels. Also, in order to improve accuracy, ensemble methods combining different model predictions with different inputs are explored. Besides geometric and arithmetic means, ensembles aggregated with the Orness Weighted Averaged (OWA) operator have shown interesting andnovel results. The proposed framework outperforms the baseline system by 14.34 percentage points. For Task 1a, the obtained development accuracy is 76.84 percent, being 62.5 percent the baseline, whereas the accuracy obtained in public leaderboard is 77.33 percent,being 64.33 percent the baseline. |
Tasks | Acoustic Scene Classification, Scene Classification |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04591v1 |
https://arxiv.org/pdf/1906.04591v1.pdf | |
PWC | https://paperswithcode.com/paper/dcase-2019-cnn-depth-analysis-with-different |
Repo | |
Framework | |
The Learning of Fuzzy Cognitive Maps With Noisy Data: A Rapid and Robust Learning Method With Maximum Entropy
Title | The Learning of Fuzzy Cognitive Maps With Noisy Data: A Rapid and Robust Learning Method With Maximum Entropy |
Authors | Guoliang Feng, Wei Lu, Witold Pedrycz, Jianhua Yang, Xiaodong Liu |
Abstract | Numerous learning methods for fuzzy cognitive maps (FCMs), such as the Hebbian-based and the population-based learning methods, have been developed for modeling and simulating dynamic systems. However, these methods are faced with several obvious limitations. Most of these models are extremely time consuming when learning the large-scale FCMs with hundreds of nodes. Furthermore, the FCMs learned by those algorithms lack robustness when the experimental data contain noise. In addition, reasonable distribution of the weights is rarely considered in these algorithms, which could result in the reduction of the performance of the resulting FCM. In this article, a straightforward, rapid, and robust learning method is proposed to learn FCMs from noisy data, especially, to learn large-scale FCMs. The crux of the proposed algorithm is to equivalently transform the learning problem of FCMs to a classic-constrained convex optimization problem in which the least-squares term ensures the robustness of the well-learned FCM and the maximum entropy term regularizes the distribution of the weights of the well-learned FCM. A series of experiments covering two frequently used activation functions (the sigmoid and hyperbolic tangent functions) are performed on both synthetic datasets with noise and real-world datasets. The experimental results show that the proposed method is rapid and robust against data containing noise and that the well-learned weights have better distribution. In addition, the FCMs learned by the proposed method also exhibit superior performance in comparison with the existing methods. Index Terms-Fuzzy cognitive maps (FCMs), maximum entropy, noisy data, rapid and robust learning. |
Tasks | |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08339v1 |
https://arxiv.org/pdf/1908.08339v1.pdf | |
PWC | https://paperswithcode.com/paper/the-learning-of-fuzzy-cognitive-maps-with |
Repo | |
Framework | |
An Alternating Manifold Proximal Gradient Method for Sparse PCA and Sparse CCA
Title | An Alternating Manifold Proximal Gradient Method for Sparse PCA and Sparse CCA |
Authors | Shixiang Chen, Shiqian Ma, Lingzhou Xue, Hui Zou |
Abstract | Sparse principal component analysis (PCA) and sparse canonical correlation analysis (CCA) are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimization problem with nonsmooth objective and nonconvex constraints. Since non-smoothness and nonconvexity bring numerical difficulties, most algorithms suggested in the literature either solve some relaxations or are heuristic and lack convergence guarantees. In this paper, we propose a new alternating manifold proximal gradient method to solve these two high-dimensional problems and provide a unified convergence analysis. Numerical experiment results are reported to demonstrate the advantages of our algorithm. |
Tasks | |
Published | 2019-03-27 |
URL | http://arxiv.org/abs/1903.11576v1 |
http://arxiv.org/pdf/1903.11576v1.pdf | |
PWC | https://paperswithcode.com/paper/an-alternating-manifold-proximal-gradient |
Repo | |
Framework | |