Paper Group ANR 978
Iterative temporal differencing with random synaptic feedback weights support error backpropagation for deep learning. Multimodal Video-based Apparent Personality Recognition Using Long Short-Term Memory and Convolutional Neural Networks. RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping. Making Asyn …
Iterative temporal differencing with random synaptic feedback weights support error backpropagation for deep learning
Title | Iterative temporal differencing with random synaptic feedback weights support error backpropagation for deep learning |
Authors | Aras R. Dargazany |
Abstract | This work shows that a differentiable activation function is not necessary any more for error backpropagation. The derivative of the activation function can be replaced by an iterative temporal differencing using fixed random feedback alignment. Using fixed random synaptic feedback alignment with an iterative temporal differencing is transforming the traditional error backpropagation into a more biologically plausible approach for learning deep neural network architectures. This can be a big step toward the integration of STDP-based error backpropagation in deep learning. |
Tasks | |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.07255v1 |
https://arxiv.org/pdf/1907.07255v1.pdf | |
PWC | https://paperswithcode.com/paper/iterative-temporal-differencing-with-random |
Repo | |
Framework | |
Multimodal Video-based Apparent Personality Recognition Using Long Short-Term Memory and Convolutional Neural Networks
Title | Multimodal Video-based Apparent Personality Recognition Using Long Short-Term Memory and Convolutional Neural Networks |
Authors | Süleyman Aslan, Uğur Güdükbay |
Abstract | Personality computing and affective computing, where the recognition of personality traits is essential, have gained increasing interest and attention in many research areas recently. We propose a novel approach to recognize the Big Five personality traits of people from videos. Personality and emotion affect the speaking style, facial expressions, body movements, and linguistic factors in social contexts, and they are affected by environmental elements. We develop a multimodal system to recognize apparent personality based on various modalities such as the face, environment, audio, and transcription features. We use modality-specific neural networks that learn to recognize the traits independently and we obtain a final prediction of apparent personality with a feature-level fusion of these networks. We employ pre-trained deep convolutional neural networks such as ResNet and VGGish networks to extract high-level features and Long Short-Term Memory networks to integrate temporal information. We train the large model consisting of modality-specific subnetworks using a two-stage training process. We first train the subnetworks separately and then fine-tune the overall model using these trained networks. We evaluate the proposed method using ChaLearn First Impressions V2 challenge dataset. Our approach obtains the best overall “mean accuracy” score, averaged over five personality traits, compared to the state-of-the-art. |
Tasks | |
Published | 2019-11-01 |
URL | https://arxiv.org/abs/1911.00381v1 |
https://arxiv.org/pdf/1911.00381v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-video-based-apparent-personality |
Repo | |
Framework | |
RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping
Title | RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping |
Authors | Liliang Ren, Gen Sun, Jiaman Wu |
Abstract | Natural gradient has been recently introduced to the field of boosting to enable the generic probabilistic predication capability. Natural gradient boosting shows promising performance improvements on small datasets due to better training dynamics, but it suffers from slow training speed overhead especially for large datasets. We present a replication study of NGBoost(Duan et al., 2019) training that carefully examines the impacts of key hyper-parameters under the circumstance of best-first decision tree learning. We find that with the regularization of leaf number clipping, the performance of NGBoost can be largely improved via a better choice of hyperparameters. Experiments show that our approach significantly beats the state-of-the-art performance on various kinds of datasets from the UCI Machine Learning Repository while still has up to 4.85x speed up compared with the original approach of NGBoost. |
Tasks | |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02338v1 |
https://arxiv.org/pdf/1912.02338v1.pdf | |
PWC | https://paperswithcode.com/paper/rongba-a-robustly-optimized-natural-gradient |
Repo | |
Framework | |
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Title | Making Asynchronous Stochastic Gradient Descent Work for Transformers |
Authors | Alham Fikri Aji, Kenneth Heafield |
Abstract | Asynchronous stochastic gradient descent (SGD) is attractive from a speed perspective because workers do not wait for synchronization. However, the Transformer model converges poorly with asynchronous SGD, resulting in substantially lower quality compared to synchronous SGD. To investigate why this is the case, we isolate differences between asynchronous and synchronous methods to investigate batch size and staleness effects. We find that summing several asynchronous updates, rather than applying them immediately, restores convergence behavior. With this hybrid method, Transformer training for neural machine translation task reaches a near-convergence level 1.36x faster in single-node multi-GPU training with no impact on model quality. |
Tasks | Machine Translation |
Published | 2019-06-08 |
URL | https://arxiv.org/abs/1906.03496v1 |
https://arxiv.org/pdf/1906.03496v1.pdf | |
PWC | https://paperswithcode.com/paper/making-asynchronous-stochastic-gradient |
Repo | |
Framework | |
Feature relevance quantification in explainable AI: A causal problem
Title | Feature relevance quantification in explainable AI: A causal problem |
Authors | Dominik Janzing, Lenon Minorics, Patrick Blöbaum |
Abstract | We discuss promising recent contributions on quantifying feature relevance using Shapley values, where we observed some confusion on which probability distribution is the right one for dropped features. We argue that the confusion is based on not carefully distinguishing between observational and interventional conditional probabilities and try a clarification based on Pearl’s seminal work on causality. We conclude that unconditional rather than conditional expectations provide the right notion of dropping features in contradiction to the theoretical justification of the software package SHAP. Parts of SHAP are unaffected because unconditional expectations (which we argue to be conceptually right) are used as approximation for the conditional ones, which encouraged others to `improve’ SHAP in a way that we believe to be flawed. | |
Tasks | |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13413v2 |
https://arxiv.org/pdf/1910.13413v2.pdf | |
PWC | https://paperswithcode.com/paper/191013413 |
Repo | |
Framework | |
Assessing Social and Intersectional Biases in Contextualized Word Representations
Title | Assessing Social and Intersectional Biases in Contextualized Word Representations |
Authors | Yi Chern Tan, L. Elisa Celis |
Abstract | Social bias in machine learning has drawn significant attention, with work ranging from demonstrations of bias in a multitude of applications, curating definitions of fairness for different contexts, to developing algorithms to mitigate bias. In natural language processing, gender bias has been shown to exist in context-free word embeddings. Recently, contextual word representations have outperformed word embeddings in several downstream NLP tasks. These word representations are conditioned on their context within a sentence, and can also be used to encode the entire sentence. In this paper, we analyze the extent to which state-of-the-art models for contextual word representations, such as BERT and GPT-2, encode biases with respect to gender, race, and intersectional identities. Towards this, we propose assessing bias at the contextual word level. This novel approach captures the contextual effects of bias missing in context-free word embeddings, yet avoids confounding effects that underestimate bias at the sentence encoding level. We demonstrate evidence of bias at the corpus level, find varying evidence of bias in embedding association tests, show in particular that racial bias is strongly encoded in contextual word models, and observe that bias effects for intersectional minorities are exacerbated beyond their constituent minority identities. Further, evaluating bias effects at the contextual word level captures biases that are not captured at the sentence level, confirming the need for our novel approach. |
Tasks | Word Embeddings |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01485v1 |
https://arxiv.org/pdf/1911.01485v1.pdf | |
PWC | https://paperswithcode.com/paper/assessing-social-and-intersectional-biases-in |
Repo | |
Framework | |
JSI-GAN: GAN-Based Joint Super-Resolution and Inverse Tone-Mapping with Pixel-Wise Task-Specific Filters for UHD HDR Video
Title | JSI-GAN: GAN-Based Joint Super-Resolution and Inverse Tone-Mapping with Pixel-Wise Task-Specific Filters for UHD HDR Video |
Authors | Soo Ye Kim, Jihyong Oh, Munchurl Kim |
Abstract | Joint learning of super-resolution (SR) and inverse tone-mapping (ITM) has been explored recently, to convert legacy low resolution (LR) standard dynamic range (SDR) videos to high resolution (HR) high dynamic range (HDR) videos for the growing need of UHD HDR TV/broadcasting applications. However, previous CNN-based methods directly reconstruct the HR HDR frames from LR SDR frames, and are only trained with a simple L2 loss. In this paper, we take a divide-and-conquer approach in designing a novel GAN-based joint SR-ITM network, called JSI-GAN, which is composed of three task-specific subnets: an image reconstruction subnet, a detail restoration (DR) subnet and a local contrast enhancement (LCE) subnet. We delicately design these subnets so that they are appropriately trained for the intended purpose, learning a pair of pixel-wise 1D separable filters via the DR subnet for detail restoration and a pixel-wise 2D local filter by the LCE subnet for contrast enhancement. Moreover, to train the JSI-GAN effectively, we propose a novel detail GAN loss alongside the conventional GAN loss, which helps enhancing both local details and contrasts to reconstruct high quality HR HDR results. When all subnets are jointly trained well, the predicted HR HDR results of higher quality are obtained with at least 0.41 dB gain in PSNR over those generated by the previous methods. |
Tasks | Image Reconstruction, Super-Resolution |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04391v2 |
https://arxiv.org/pdf/1909.04391v2.pdf | |
PWC | https://paperswithcode.com/paper/jsi-gan-gan-based-joint-super-resolution-and |
Repo | |
Framework | |
Disentanglement Challenge: From Regularization to Reconstruction
Title | Disentanglement Challenge: From Regularization to Reconstruction |
Authors | Jie Qiao, Zijian Li, Boyan Xu, Ruichu Cai, Kun Zhang |
Abstract | The challenge of learning disentangled representation has recently attracted much attention and boils down to a competition using a new real world disentanglement dataset (Gondal et al., 2019). Various methods based on variational auto-encoder have been proposed to solve this problem, by enforcing the independence between the representation and modifying the regularization term in the variational lower bound. However recent work by Locatello et al. (2018) has demonstrated that the proposed methods are heavily influenced by randomness and the choice of the hyper-parameter. In this work, instead of designing a new regularization term, we adopt the FactorVAE but improve the reconstruction performance and increase the capacity of network and the training step. The strategy turns out to be very effective and achieve the 1st place in the challenge. |
Tasks | |
Published | 2019-11-30 |
URL | https://arxiv.org/abs/1912.00155v1 |
https://arxiv.org/pdf/1912.00155v1.pdf | |
PWC | https://paperswithcode.com/paper/disentanglement-challenge-from-regularization |
Repo | |
Framework | |
Personalized Neural Embeddings for Collaborative Filtering with Text
Title | Personalized Neural Embeddings for Collaborative Filtering with Text |
Authors | Guangneng Hu |
Abstract | Collaborative filtering (CF) is a core technique for recommender systems. Traditional CF approaches exploit user-item relations (e.g., clicks, likes, and views) only and hence they suffer from the data sparsity issue. Items are usually associated with unstructured text such as article abstracts and product reviews. We develop a Personalized Neural Embedding (PNE) framework to exploit both interactions and words seamlessly. We learn such embeddings of users, items, and words jointly, and predict user preferences on items based on these learned representations. PNE estimates the probability that a user will like an item by two terms—behavior factors and semantic factors. On two real-world datasets, PNE shows better performance than four state-of-the-art baselines in terms of three metrics. We also show that PNE learns meaningful word embeddings by visualization. |
Tasks | Recommendation Systems, Word Embeddings |
Published | 2019-03-19 |
URL | http://arxiv.org/abs/1903.07860v1 |
http://arxiv.org/pdf/1903.07860v1.pdf | |
PWC | https://paperswithcode.com/paper/personalized-neural-embeddings-for |
Repo | |
Framework | |
Deep Reinforcement Learning for Control of Probabilistic Boolean Networks
Title | Deep Reinforcement Learning for Control of Probabilistic Boolean Networks |
Authors | Georgios Papagiannis, Sotiris Moschoyiannis |
Abstract | Probabilistic Boolean Networks (PBNs) were introduced as a computational model for studying gene interactions in Gene Regulatory Networks (GRNs). Controllability of PBNs, and hence GRNs, is the process of making strategic interventions to a network in order to drive it from a particular state towards some other potentially more desirable state. This is of significant importance to systems biology as successful control could be used to obtain potential gene treatments by making therapeutic interventions. Recent advancements in Deep Reinforcement Learning have enabled systems to develop policies merely by interacting with the environment, without complete knowledge of the underlying Markov Decision Process (MDP). In this paper we propose the use of a Deep Q Network with Double Q Learning, that directly interacts with the environment - that is, a Probabilistic Boolean Network. The proposed approach is trained by sampling experiences obtained from the environment using Prioritised Experience Replay and successfully determines a control policy that directs a PBN from any state to the desired state (attractor). We demonstrate successful results on significantly larger PBNs compared to previous approaches under our control framework. |
Tasks | Q-Learning |
Published | 2019-09-07 |
URL | https://arxiv.org/abs/1909.03331v4 |
https://arxiv.org/pdf/1909.03331v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-control-of |
Repo | |
Framework | |
Gradient Q$(σ, λ)$: A Unified Algorithm with Function Approximation for Reinforcement Learning
Title | Gradient Q$(σ, λ)$: A Unified Algorithm with Function Approximation for Reinforcement Learning |
Authors | Long Yang, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan |
Abstract | Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q$(\sigma,\lambda)$ is the first approach unifies them with eligibility trace through the sampling degree $\sigma$. However, it is limited to the tabular case, for large-scale learning, the Q$(\sigma,\lambda)$ is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ$(\sigma,\lambda)$ that extends tabular Q$(\sigma,\lambda)$ with linear function approximation. We prove the convergence of GQ$(\sigma,\lambda)$. Empirical results on some standard domains show that GQ$(\sigma,\lambda)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods. |
Tasks | Q-Learning |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02877v1 |
https://arxiv.org/pdf/1909.02877v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-q-a-unified-algorithm-with-function |
Repo | |
Framework | |
Improved mutual information measure for classification and community detection
Title | Improved mutual information measure for classification and community detection |
Authors | M. E. J. Newman, George T. Cantwell, Jean Gabriel Young |
Abstract | The information theoretic quantity known as mutual information finds wide use in classification and community detection analyses to compare two classifications of the same set of objects into groups. In the context of classification algorithms, for instance, it is often used to compare discovered classes to known ground truth and hence to quantify algorithm performance. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We demonstrate how to correct this error and define a mutual information that works in all cases. We discuss practical implementation of the new measure and give some example applications. |
Tasks | Community Detection |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12581v1 |
https://arxiv.org/pdf/1907.12581v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-mutual-information-measure-for |
Repo | |
Framework | |
Contextual Combinatorial Conservative Bandits
Title | Contextual Combinatorial Conservative Bandits |
Authors | Xiaojin Zhang, Shuai Li, Weiwen Liu |
Abstract | The problem of multi-armed bandits (MAB) asks to make sequential decisions while balancing between exploitation and exploration, and have been successfully applied to a wide range of practical scenarios. Various algorithms have been designed to achieve a high reward in a long term. However, its short-term performance might be rather low, which is injurious in risk sensitive applications. Building on previous work of conservative bandits, we bring up a framework of contextual combinatorial conservative bandits. An algorithm is presented and a regret bound of $\tilde O(d^2+d\sqrt{T})$ is proven, where $d$ is the dimension of the feature vectors, and $T$ is the total number of time steps. We further provide an algorithm as well as regret analysis for the case when the conservative reward is unknown. Experiments are conducted, and the results validate the effectiveness of our algorithm. |
Tasks | Multi-Armed Bandits |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11337v1 |
https://arxiv.org/pdf/1911.11337v1.pdf | |
PWC | https://paperswithcode.com/paper/contextual-combinatorial-conservative-bandits |
Repo | |
Framework | |
Corruption Robust Exploration in Episodic Reinforcement Learning
Title | Corruption Robust Exploration in Episodic Reinforcement Learning |
Authors | Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun |
Abstract | We initiate the study of multi-stage episodic reinforcement learning under adversarial manipulations in both the rewards and the transition probabilities of the underlying system. Existing efficient algorithms heavily rely on the “optimism under uncertainty” principle which dictates their behavior and does not allow flexibility to perform corruption-robust exploration. We address this by (i) departing from the optimistic behavior, and (ii) creating a general framework that incorporates the principle of action-elimination. (This principle has been essential for corruption-robust exploration in multi-armed bandits, a degenerate special case of episodic reinforcement learning.) Despite constructing a lower bound for a straightforward implementation of action-elimination, we provide a clean and modular way to transfer it to episodic reinforcement learning. Our algorithm enjoys near-optimal guarantees in the absence of adversarial manipulations, has performance that degrades gracefully as the amount of corruption increases, and does not need to know this amount. Our results shed new light on the broader question of robust exploration, and suggest a way to address a rather daunting mismatch between optimistic algorithms and algorithms with higher flexibility. To demonstrate the applicability of our framework, we provide a second instantiation thereof, showing how it can provide efficient guarantees for the stochastic setting, despite doing almost uniform exploration across plausibly optimal actions. |
Tasks | Multi-Armed Bandits |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08689v1 |
https://arxiv.org/pdf/1911.08689v1.pdf | |
PWC | https://paperswithcode.com/paper/corruption-robust-exploration-in-episodic |
Repo | |
Framework | |
A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality
Title | A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality |
Authors | Navid Azizan, Babak Hassibi |
Abstract | Stochastic mirror descent (SMD) is a fairly new family of algorithms that has recently found a wide range of applications in optimization, machine learning, and control. It can be considered a generalization of the classical stochastic gradient algorithm (SGD), where instead of updating the weight vector along the negative direction of the stochastic gradient, the update is performed in a “mirror domain” defined by the gradient of a (strictly convex) potential function. This potential function, and the mirror domain it yields, provides considerable flexibility in the algorithm compared to SGD. While many properties of SMD have already been obtained in the literature, in this paper we exhibit a new interpretation of SMD, namely that it is a risk-sensitive optimal estimator when the unknown weight vector and additive noise are non-Gaussian and belong to the exponential family of distributions. The analysis also suggests a modified version of SMD, which we refer to as symmetric SMD (SSMD). The proofs rely on some simple properties of Bregman divergence, which allow us to extend results from quadratics and Gaussians to certain convex functions and exponential families in a rather seamless way. |
Tasks | |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01855v1 |
http://arxiv.org/pdf/1904.01855v1.pdf | |
PWC | https://paperswithcode.com/paper/a-stochastic-interpretation-of-stochastic |
Repo | |
Framework | |