Paper Group ANR 464
Sampled Softmax with Random Fourier Features. Learning Nearly Decomposable Value Functions Via Communication Minimization. Hide and Speak: Deep Neural Networks for Speech Steganography. Smart Grid Cyber Attacks Detection using Supervised Learning and Heuristic Feature Selection. Interpretations of Deep Learning by Forests and Haar Wavelets. Learnin …
Sampled Softmax with Random Fourier Features
Title | Sampled Softmax with Random Fourier Features |
Authors | Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar |
Abstract | The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method. However, the sampled softmax provides a biased estimate of the gradient unless the samples are drawn from the exact softmax distribution, which is again expensive to compute. Therefore, a widely employed practical approach involves sampling from a simpler distribution in the hope of approximating the exact softmax distribution. In this paper, we develop the first theoretical understanding of the role that different sampling distributions play in determining the quality of sampled softmax. Motivated by our analysis and the work on kernel-based sampling, we propose the Random Fourier Softmax (RF-softmax) method that utilizes the powerful Random Fourier Features to enable more efficient and accurate sampling from an approximate softmax distribution. We show that RF-softmax leads to low bias in estimation in terms of both the full softmax distribution and the full softmax gradient. Furthermore, the cost of RF-softmax scales only logarithmically with the number of classes. |
Tasks | |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10747v2 |
https://arxiv.org/pdf/1907.10747v2.pdf | |
PWC | https://paperswithcode.com/paper/sampled-softmax-with-random-fourier-features |
Repo | |
Framework | |
Learning Nearly Decomposable Value Functions Via Communication Minimization
Title | Learning Nearly Decomposable Value Functions Via Communication Minimization |
Authors | Tonghan Wang, Jianhao Wang, Chongyi Zheng, Chongjie Zhang |
Abstract | Reinforcement learning encounters major challenges in multi-agent settings, such as scalability and non-stationarity. Recently, value function factorization learning emerges as a promising way to address these challenges in collaborative multi-agent systems. However, existing methods have been focusing on learning fully decentralized value function, which are not efficient for tasks requiring communication. To address this limitation, this paper presents a novel framework for learning nearly decomposable value functions with communication, with which agents act on their own most of the time but occasionally send messages to other agents in order for effective coordination. This framework hybridizes value function factorization learning and communication learning by introducing two information-theoretic regularizers. These regularizers are maximizing mutual information between decentralized Q functions and communication messages while minimizing the entropy of messages between agents. We show how to optimize these regularizers in a way that is easily integrated with existing value function factorization methods such as QMIX. Finally, we demonstrate that, on the StarCraft unit micromanagement benchmark, our framework significantly outperforms baseline methods and allows to cut off more than $80%$ communication without sacrificing the performance. The video of our experiments is available at https://sites.google.com/view/ndvf. |
Tasks | Starcraft |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.05366v1 |
https://arxiv.org/pdf/1910.05366v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-nearly-decomposable-value-functions-1 |
Repo | |
Framework | |
Hide and Speak: Deep Neural Networks for Speech Steganography
Title | Hide and Speak: Deep Neural Networks for Speech Steganography |
Authors | Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet |
Abstract | Steganography is the science of hiding a secret message within an ordinary public message, which referred to as Carrier. Traditionally, digital signal processing techniques, such as least significant bit encoding, were used for hiding messages. In this paper, we explore the use of deep neural networks as steganographic functions for speech data. To this end, we propose to jointly optimize two neural networks: the first network encodes the message inside a carrier, while the second network decodes the message from the modified carrier. We demonstrated the effectiveness of our method on several speech data-sets and analyzed the results quantitatively and qualitatively. Moreover, we showed that our approach could be applied to conceal multiple messages in a single carrier using multiple decoders or a single conditional decoder. Qualitative experiments suggest that modifications to the carrier are unnoticeable by human listeners and that the decoded messages are highly intelligible. |
Tasks | |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.03083v1 |
http://arxiv.org/pdf/1902.03083v1.pdf | |
PWC | https://paperswithcode.com/paper/hide-and-speak-deep-neural-networks-for |
Repo | |
Framework | |
Smart Grid Cyber Attacks Detection using Supervised Learning and Heuristic Feature Selection
Title | Smart Grid Cyber Attacks Detection using Supervised Learning and Heuristic Feature Selection |
Authors | Jacob Sakhnini, Hadis Karimipour, Ali Dehghantanha |
Abstract | False Data Injection (FDI) attacks are a common form of Cyber-attack targetting smart grids. Detection of stealthy FDI attacks is impossible by the current bad data detection systems. Machine learning is one of the alternative methods proposed to detect FDI attacks. This paper analyzes three various supervised learning techniques, each to be used with three different feature selection (FS) techniques. These methods are tested on the IEEE 14-bus, 57-bus, and 118-bus systems for evaluation of versatility. Accuracy of the classification is used as the main evaluation method for each detection technique. Simulation study clarify the supervised learning combined with heuristic FS methods result in an improved performance of the classification algorithms for FDI attack detection. |
Tasks | Feature Selection |
Published | 2019-07-07 |
URL | https://arxiv.org/abs/1907.03313v1 |
https://arxiv.org/pdf/1907.03313v1.pdf | |
PWC | https://paperswithcode.com/paper/smart-grid-cyber-attacks-detection-using |
Repo | |
Framework | |
Interpretations of Deep Learning by Forests and Haar Wavelets
Title | Interpretations of Deep Learning by Forests and Haar Wavelets |
Authors | Changcun Huang |
Abstract | This paper presents a basic property of region dividing of ReLU (rectified linear unit) deep learning when new layers are successively added, by which two new perspectives of interpreting deep learning are given. The first is related to decision trees and forests; we construct a deep learning structure equivalent to a forest in classification abilities, which means that certain kinds of ReLU deep learning can be considered as forests. The second perspective is that Haar wavelet represented functions can be approximated by ReLU deep learning with arbitrary precision; and then a general conclusion of function approximation abilities of ReLU deep learning is given. Finally, generalize some of the conclusions of ReLU deep learning to the case of sigmoid-unit deep learning. |
Tasks | |
Published | 2019-06-16 |
URL | https://arxiv.org/abs/1906.06706v7 |
https://arxiv.org/pdf/1906.06706v7.pdf | |
PWC | https://paperswithcode.com/paper/a-general-interpretation-of-deep-learning-by |
Repo | |
Framework | |
Learning Video Representations using Contrastive Bidirectional Transformer
Title | Learning Video Representations using Contrastive Bidirectional Transformer |
Authors | Chen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid |
Abstract | This paper proposes a self-supervised learning approach for video features that results in significantly improved performance on downstream tasks (such as video classification, captioning and segmentation) compared to existing methods. Our method extends the BERT model for text sequences to the case of sequences of real-valued feature vectors, by replacing the softmax loss with noise contrastive estimation (NCE). We also show how to learn representations from sequences of visual features and sequences of words derived from ASR (automatic speech recognition), and show that such cross-modal training (when possible) helps even more. |
Tasks | Representation Learning, Speech Recognition, Video Captioning, Video Classification |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05743v2 |
https://arxiv.org/pdf/1906.05743v2.pdf | |
PWC | https://paperswithcode.com/paper/contrastive-bidirectional-transformer-for |
Repo | |
Framework | |
Enhancing streamflow forecast and extracting insights using long-short term memory networks with data integration at continental scales
Title | Enhancing streamflow forecast and extracting insights using long-short term memory networks with data integration at continental scales |
Authors | Dapeng Feng, Kuai Fang, Chaopeng Shen |
Abstract | Recent observations with varied schedules and types (moving average, snapshot, or regularly spaced) can help to improve streamflow forecast but it is difficult to effectively integrate them. Based on a long short-term memory (LSTM) streamflow model, we tested different formulations in a flexible method we call data integration (DI) to integrate recently discharge measurements to improve forecast. DI accepts lagged inputs either directly or through a convolutional neural network (CNN) unit. DI can ubiquitously elevate streamflow forecast performance to unseen levels, reaching a continental-scale median Nash-Sutcliffe coefficient of 0.86. Integrating moving-average discharge, discharge from a few days ago, or even average discharge of the last calendar month could all improve daily forecast. It turned out, directly using lagged observations as inputs was comparable in performance to using the CNN unit. Importantly, we obtained valuable insights regarding hydrologic processes impacting LSTM and DI performance. Before applying DI, the original LSTM worked well in mountainous regions and snow-dominated regions, but less so in regions with low discharge volumes (due to either low precipitation or high precipitation-energy synchronicity) and large inter-annual storage variability. DI was most beneficial in regions with high flow autocorrelation: it greatly reduced baseflow bias in groundwater-dominated western basins; it also improved the peaks for basins with dynamical surface water storage, e.g., the Prairie Potholes or Great Lakes regions. However, even DI cannot help high-aridity basins with one-day flash peaks. There is much promise with a deep-learning-based forecast paradigm due to its performance, automation, efficiency, and flexibility. |
Tasks | |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08949v2 |
https://arxiv.org/pdf/1912.08949v2.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-streamflow-forecast-and-extracting |
Repo | |
Framework | |
Explainable Reinforcement Learning Through a Causal Lens
Title | Explainable Reinforcement Learning Through a Causal Lens |
Authors | Prashan Madumal, Tim Miller, Liz Sonenberg, Frank Vetere |
Abstract | Prevalent theories in cognitive science propose that humans understand and represent the knowledge of the world through causal relationships. In making sense of the world, we build causal models in our mind to encode cause-effect relations of events and use these to explain why new events happen. In this paper, we use causal models to derive causal explanations of behaviour of reinforcement learning agents. We present an approach that learns a structural causal model during reinforcement learning and encodes causal relationships between variables of interest. This model is then used to generate explanations of behaviour based on counterfactual analysis of the causal model. We report on a study with 120 participants who observe agents playing a real-time strategy game (Starcraft II) and then receive explanations of the agents’ behaviour. We investigated: 1) participants’ understanding gained by explanations through task prediction; 2) explanation satisfaction and 3) trust. Our results show that causal model explanations perform better on these measures compared to two other baseline explanation models. |
Tasks | Starcraft, Starcraft II |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.10958v2 |
https://arxiv.org/pdf/1905.10958v2.pdf | |
PWC | https://paperswithcode.com/paper/explainable-reinforcement-learning-through-a |
Repo | |
Framework | |
Semantically Interpretable Activation Maps: what-where-how explanations within CNNs
Title | Semantically Interpretable Activation Maps: what-where-how explanations within CNNs |
Authors | Diego Marcos, Sylvain Lobry, Devis Tuia |
Abstract | A main issue preventing the use of Convolutional Neural Networks (CNN) in end user applications is the low level of transparency in the decision process. Previous work on CNN interpretability has mostly focused either on localizing the regions of the image that contribute to the result or on building an external model that generates plausible explanations. However, the former does not provide any semantic information and the latter does not guarantee the faithfulness of the explanation. We propose an intermediate representation composed of multiple Semantically Interpretable Activation Maps (SIAM) indicating the presence of predefined attributes at different locations of the image. These attribute maps are then linearly combined to produce the final output. This gives the user insight into what the model has seen, where, and a final output directly linked to this information in a comprehensive and interpretable way. We test the method on the task of landscape scenicness (aesthetic value) estimation, using an intermediate representation of 33 attributes from the SUN Attributes database. The results confirm that SIAM makes it possible to understand what attributes in the image are contributing to the final score and where they are located. Since it is based on learning from multiple tasks and datasets, SIAM improve the explanability of the prediction without additional annotation efforts or computational overhead at inference time, while keeping good performances on both the final and intermediate tasks. |
Tasks | |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08442v1 |
https://arxiv.org/pdf/1909.08442v1.pdf | |
PWC | https://paperswithcode.com/paper/semantically-interpretable-activation-maps |
Repo | |
Framework | |
Are Registration Uncertainty and Error Monotonically Associated
Title | Are Registration Uncertainty and Error Monotonically Associated |
Authors | Jie Luo, Sarah Frisken, Duo Wang, Alexandra Golby, Masashi Sugiyama, William M. Wells III |
Abstract | In image-guided neurosurgery, current commercial systems usually provide only rigid registration, partly because it is harder to predict, validate and understand non-rigid registration error. For instance, when surgeons see a discrepancy in aligned image features, they may not be able to distinguish between registration error and actual tissue deformation caused by tumor resection. In this case, the spatial distribution of registration error could help them make more informed decisions, e.g., ignoring the registration where the estimated error is high. However, error estimates are difficult to acquire. Probabilistic image registration (PIR) methods provide measures of registration uncertainty, which could be a surrogate for assessing the registration error. It is intuitive and believed by many clinicians that high uncertainty indicates a large error. However, the monotonic association between uncertainty and error has not been examined in image registration literature. In this pilot study, we attempt to address this fundamental problem by looking at one PIR method, the Gaussian process (GP) registration. We systematically investigate the relation between GP uncertainty and error based on clinical data and show empirically that there is a weak-to-moderate positive monotonic correlation between point-wise GP registration uncertainty and non-rigid registration error. |
Tasks | Image Registration |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.07709v2 |
https://arxiv.org/pdf/1908.07709v2.pdf | |
PWC | https://paperswithcode.com/paper/190807709 |
Repo | |
Framework | |
One-Shot Texture Retrieval with Global Context Metric
Title | One-Shot Texture Retrieval with Global Context Metric |
Authors | Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao |
Abstract | In this paper, we tackle one-shot texture retrieval: given an example of a new reference texture, detect and segment all the pixels of the same texture category within an arbitrary image. To address this problem, we present an OS-TR network to encode both reference and query image, leading to achieve texture segmentation towards the reference category. Unlike the existing texture encoding methods that integrate CNN with orderless pooling, we propose a directionality-aware module to capture the texture variations at each direction, resulting in spatially invariant representation. To segment new categories given only few examples, we incorporate a self-gating mechanism into relation network to exploit global context information for adjusting per-channel modulation weights of local relation features. Extensive experiments on benchmark texture datasets and real scenarios demonstrate the above-par segmentation performance and robust generalization across domains of our proposed method. |
Tasks | |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06656v1 |
https://arxiv.org/pdf/1905.06656v1.pdf | |
PWC | https://paperswithcode.com/paper/one-shot-texture-retrieval-with-global |
Repo | |
Framework | |
Experimental Evaluation of Individualized Treatment Rules
Title | Experimental Evaluation of Individualized Treatment Rules |
Authors | Kosuke Imai, Michael Lingzhi Li |
Abstract | In recent years, the increasing availability of individual-level data has led to the rapid methodological development of individualized (or personalized) treatment rules (ITRs). These new tools are being deployed in a variety of fields including business, medicine, and politics. We propose to use a randomized experiment for evaluating the empirical performance of ITRs and quantifying its estimation uncertainty under the Neyman’s repeated sampling framework. Unlike the existing methods, the proposed experimental evaluation requires neither modeling assumptions, asymptotic approximation, nor resampling method. As a result, it is applicable to any ITR including those based on complex machine learning algorithms. Our methodology also takes into account a budget constraint, which is an important consideration for policymakers with limited resources. Furthermore, we extend our theoretical results to the common situations, in which ITRs are estimated via cross-validation using the same experimental data as the one used for their evaluation. We show how to account for the additional uncertainty regarding the estimation of ITRs. Finally, we conduct a simulation study to demonstrate the accuracy of the proposed methodology in small samples. We also apply our methods to the Project STAR (Student-Teacher Achievement Ratio) experiment and compare the performance of ITRs based on several machine learning methods that are widely used for estimating heterogeneous treatment effects. |
Tasks | |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05389v2 |
https://arxiv.org/pdf/1905.05389v2.pdf | |
PWC | https://paperswithcode.com/paper/experimental-evaluation-of-individualized |
Repo | |
Framework | |
SoDeep: a Sorting Deep net to learn ranking loss surrogates
Title | SoDeep: a Sorting Deep net to learn ranking loss surrogates |
Authors | Martin Engilberge, Louis Chevallier, Patrick Pérez, Matthieu Cord |
Abstract | Several tasks in machine learning are evaluated using non-differentiable metrics such as mean average precision or Spearman correlation. However, their non-differentiability prevents from using them as objective functions in a learning framework. Surrogate and relaxation methods exist but tend to be specific to a given metric. In the present work, we introduce a new method to learn approximations of such non-differentiable objective functions. Our approach is based on a deep architecture that approximates the sorting of arbitrary sets of scores. It is trained virtually for free using synthetic data. This sorting deep (SoDeep) net can then be combined in a plug-and-play manner with existing deep architectures. We demonstrate the interest of our approach in three different tasks that require ranking: Cross-modal text-image retrieval, multi-label image classification and visual memorability ranking. Our approach yields very competitive results on these three tasks, which validates the merit and the flexibility of SoDeep as a proxy for sorting operation in ranking-based losses. |
Tasks | Image Classification, Image Retrieval |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.04272v1 |
http://arxiv.org/pdf/1904.04272v1.pdf | |
PWC | https://paperswithcode.com/paper/sodeep-a-sorting-deep-net-to-learn-ranking |
Repo | |
Framework | |
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
Title | Recommender Systems Fairness Evaluation via Generalized Cross Entropy |
Authors | Yashar Deldjoo, Vito Walter Anelli, Hamed Zamani, Alejandro Bellogin, Tommaso Di Noia |
Abstract | Fairness in recommender systems has been considered with respect to sensitive attributes of users (e.g., gender, race) or items (e.g., revenue in a multistakeholder setting). Regardless, the concept has been commonly interpreted as some form of equality – i.e., the degree to which the system is meeting the information needs of all its users in an equal sense. In this paper, we argue that fairness in recommender systems does not necessarily imply equality, but instead it should consider a distribution of resources based on merits and needs. We present a probabilistic framework based on generalized cross entropy to evaluate fairness of recommender systems under this perspective, where we show that the proposed framework is flexible and explanatory by allowing to incorporate domain knowledge (through an ideal fair distribution) that can help to understand which item or user aspects a recommendation algorithm is over- or under-representing. Results on two real-world datasets show the merits of the proposed evaluation framework both in terms of user and item fairness. |
Tasks | Recommendation Systems |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06708v1 |
https://arxiv.org/pdf/1908.06708v1.pdf | |
PWC | https://paperswithcode.com/paper/recommender-systems-fairness-evaluation-via |
Repo | |
Framework | |
Using SMT Solvers to Validate Models for AI Problems
Title | Using SMT Solvers to Validate Models for AI Problems |
Authors | Andrei Arusoaie, Ionut Pistol |
Abstract | Artificial Intelligence problems, ranging form planning/scheduling up to game control, include an essential crucial step: describing a model which accurately defines the problem’s required data, requirements, allowed transitions and established goals. The ways in which a model can fail are numerous and often lead to a failure of search strategies to provide a quick, optimal, or even any solution. This paper proposes using SMT (Satisfiability Modulo Theories) solvers, such as Z3, to check the validity of a model. We propose two tests: checking whether a final(goal) state exists in the model’s described problem space and checking whether the transitions described can provide a path from the identified initial states to any the goal states (meaning a solution has been found). The advantage of using an SMT solver for AI model checking is that they substitute actual search strategies and they work over an abstract representation of the model, that is, a set of logical formulas. Reasoning at an abstract level is not as expensive as exploring the entire solution space. SMT solvers use efficient decision procedures which provide proofs for the logical formulas corresponding to the AI model. A recent addition to Z3 allowed us to describe sequences of transitions as a recursive function, thus we can check if a solution can be found in the defined model. |
Tasks | |
Published | 2019-03-22 |
URL | http://arxiv.org/abs/1903.09475v1 |
http://arxiv.org/pdf/1903.09475v1.pdf | |
PWC | https://paperswithcode.com/paper/using-smt-solvers-to-validate-models-for-ai |
Repo | |
Framework | |