Paper Group ANR 655
Multimodal Emotion Recognition Using Deep Canonical Correlation Analysis. Self-Supervised Learning by Cross-Modal Audio-Video Clustering. ACES – Automatic Configuration of Energy Harvesting Sensors with Reinforcement Learning. What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use. Eliciting and Enforcing Subjectiv …
Multimodal Emotion Recognition Using Deep Canonical Correlation Analysis
Title | Multimodal Emotion Recognition Using Deep Canonical Correlation Analysis |
Authors | Wei Liu, Jie-Lin Qiu, Wei-Long Zheng, Bao-Liang Lu |
Abstract | Multimodal signals are more powerful than unimodal data for emotion recognition since they can represent emotions more comprehensively. In this paper, we introduce deep canonical correlation analysis (DCCA) to multimodal emotion recognition. The basic idea behind DCCA is to transform each modality separately and coordinate different modalities into a hyperspace by using specified canonical correlation analysis constraints. We evaluate the performance of DCCA on five multimodal datasets: the SEED, SEED-IV, SEED-V, DEAP, and DREAMER datasets. Our experimental results demonstrate that DCCA achieves state-of-the-art recognition accuracy rates on all five datasets: 94.58% on the SEED dataset, 87.45% on the SEED-IV dataset, 84.33% and 85.62% for two binary classification tasks and 88.51% for a four-category classification task on the DEAP dataset, 83.08% on the SEED-V dataset, and 88.99%, 90.57%, and 90.67% for three binary classification tasks on the DREAMER dataset. We also compare the noise robustness of DCCA with that of existing methods when adding various amounts of noise to the SEED-V dataset. The experimental results indicate that DCCA has greater robustness. By visualizing feature distributions with t-SNE and calculating the mutual information between different modalities before and after using DCCA, we find that the features transformed by DCCA from different modalities are more homogeneous and discriminative across emotions. |
Tasks | Emotion Recognition, Multimodal Emotion Recognition |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.05349v1 |
https://arxiv.org/pdf/1908.05349v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-emotion-recognition-using-deep |
Repo | |
Framework | |
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Title | Self-Supervised Learning by Cross-Modal Audio-Video Clustering |
Authors | Humam Alwassel, Dhruv Mahajan, Lorenzo Torresani, Bernard Ghanem, Du Tran |
Abstract | The visual and audio modalities are highly correlated yet they contain different information. Their strong correlation makes it possible to predict the semantics of one from the other with good accuracy. Their intrinsic differences make cross-modal prediction a potentially more rewarding pretext task for self-supervised learning of video and audio representations compared to within-modality learning. Based on this intuition, we propose Cross-Modal Deep Clustering (XDC), a novel self-supervised method that leverages unsupervised clustering in one modality (e.g. audio) as a supervisory signal for the other modality (e.g. video). This cross-modal supervision helps XDC utilize the semantic correlation and the differences between the two modalities. Our experiments show that XDC significantly outperforms single-modality clustering and other multi-modal variants. Our XDC achieves state-of-the-art accuracy among self-supervised methods on several video and audio benchmarks including HMDB51, UCF101, ESC50, and DCASE. Most importantly, the video model pretrained with XDC significantly outperforms the same model pretrained with full-supervision on both ImageNet and Kinetics in action recognition on HMDB51 and UCF101. To the best of our knowledge, XDC is the first method to demonstrate that self-supervision outperforms large-scale full-supervision in representation learning for action recognition. |
Tasks | Representation Learning |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12667v1 |
https://arxiv.org/pdf/1911.12667v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-learning-by-cross-modal-audio |
Repo | |
Framework | |
ACES – Automatic Configuration of Energy Harvesting Sensors with Reinforcement Learning
Title | ACES – Automatic Configuration of Energy Harvesting Sensors with Reinforcement Learning |
Authors | Francesco Fraternali, Bharathan Balaji, Yuvraj Agarwal, Rajesh K. Gupta |
Abstract | Internet of Things forms the backbone of modern building applications. Wireless sensors are being increasingly adopted for their flexibility and reduced cost of deployment. However, most wireless sensors are powered by batteries today and large deployments are inhibited by manual battery replacement. Energy harvesting sensors provide an attractive alternative, but they need to provide adequate quality of service to applications given uncertain energy availability. We propose using reinforcement learning to optimize the operation of energy harvesting sensors to maximize sensing quality with available energy. We present our system ACES that uses reinforcement learning for periodic and event-driven sensing indoors with ambient light energy harvesting. Our custom-built board uses a supercapacitor to store energy temporarily, senses light, motion events and relays them using Bluetooth Low Energy. Using simulations and real deployments, we show that our sensor nodes adapt to their lighting conditions and continuously sends measurements and events across nights and weekends. We use deployment data to continually adapt sensing to changing environmental patterns and transfer learning to reduce the training time in real deployments. In our 60 node deployment lasting two weeks, we observe a dead time of 0.1%. The periodic sensors that measure luminosity have a mean sampling period of 90 seconds and the event sensors that detect motion with PIR captured 86% of the events on average compared to a battery-powered node. |
Tasks | Transfer Learning |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01968v1 |
https://arxiv.org/pdf/1909.01968v1.pdf | |
PWC | https://paperswithcode.com/paper/aces-automatic-configuration-of-energy |
Repo | |
Framework | |
What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use
Title | What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use |
Authors | Sana Tonekaboni, Shalmali Joshi, Melissa D McCradden, Anna Goldenberg |
Abstract | Translating machine learning (ML) models effectively to clinical practice requires establishing clinicians’ trust. Explainability, or the ability of an ML model to justify its outcomes and assist clinicians in rationalizing the model prediction, has been generally understood to be critical to establishing trust. However, the field suffers from the lack of concrete definitions for usable explanations in different settings. To identify specific aspects of explainability that may catalyze building trust in ML models, we surveyed clinicians from two distinct acute care specialties (Intenstive Care Unit and Emergency Department). We use their feedback to characterize when explainability helps to improve clinicians’ trust in ML models. We further identify the classes of explanations that clinicians identified as most relevant and crucial for effective translation to clinical practice. Finally, we discern concrete metrics for rigorous evaluation of clinical explainability methods. By integrating perceptions of explainability between clinicians and ML researchers we hope to facilitate the endorsement and broader adoption and sustained use of ML systems in healthcare. |
Tasks | |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05134v2 |
https://arxiv.org/pdf/1905.05134v2.pdf | |
PWC | https://paperswithcode.com/paper/what-clinicians-want-contextualizing |
Repo | |
Framework | |
Eliciting and Enforcing Subjective Individual Fairness
Title | Eliciting and Enforcing Subjective Individual Fairness |
Authors | Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan Stapleton, Zhiwei Steven Wu |
Abstract | We revisit the notion of individual fairness first proposed by Dwork et al. [2012], which asks that “similar individuals should be treated similarly”. A primary difficulty with this definition is that it assumes a completely specified fairness metric for the task at hand. In contrast, we consider a framework for fairness elicitation, in which fairness is indirectly specified only via a sample of pairs of individuals who should be treated (approximately) equally on the task. We make no assumption that these pairs are consistent with any metric. We provide a provably convergent oracle-efficient algorithm for minimizing error subject to the fairness constraints, and prove generalization theorems for both accuracy and fairness. Since the constrained pairs could be elicited either from a panel of judges, or from particular individuals, our framework provides a means for algorithmically enforcing subjective notions of fairness. We report on preliminary findings of a behavioral study of subjective fairness using human-subject fairness constraints elicited on the COMPAS criminal recidivism dataset. |
Tasks | |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10660v1 |
https://arxiv.org/pdf/1905.10660v1.pdf | |
PWC | https://paperswithcode.com/paper/eliciting-and-enforcing-subjective-individual |
Repo | |
Framework | |
Path Planning Games
Title | Path Planning Games |
Authors | Yi Li, Yevgeniy Vorobeychik |
Abstract | Path planning is a fundamental and extensively explored problem in robotic control. We present a novel economic perspective on path planning. Specifically, we investigate strategic interactions among path planning agents using a game theoretic path planning framework. Our focus is on economic tension between two important objectives: efficiency in the agents’ achieving their goals, and safety in navigating towards these. We begin by developing a novel mathematical formulation for path planning that trades off these objectives, when behavior of other agents is fixed. We then use this formulation for approximating Nash equilibria in path planning games, as well as to develop a multi-agent cooperative path planning formulation. Through several case studies, we show that in a path planning game, safety is often significantly compromised compared to a cooperative solution. |
Tasks | |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13880v1 |
https://arxiv.org/pdf/1910.13880v1.pdf | |
PWC | https://paperswithcode.com/paper/path-planning-games |
Repo | |
Framework | |
Applications of Linear Defeasible Logic: combining resource consumption and exceptions to energy management and business processes
Title | Applications of Linear Defeasible Logic: combining resource consumption and exceptions to energy management and business processes |
Authors | Francesco Olivieri, Guido Governatori, Claudio Tomazzoli, Matteo Cristani |
Abstract | Linear Logic and Defeasible Logic have been adopted to formalise different features of knowledge representation: consumption of resources, and non monotonic reasoning in particular to represent exceptions. Recently, a framework to combine sub-structural features, corresponding to the consumption of resources, with defeasibility aspects to handle potentially conflicting information, has been discussed in literature, by some of the authors. Two applications emerged that are very relevant: energy management and business process management. We illustrate a set of guide lines to determine how to apply linear defeasible logic to those contexts. |
Tasks | |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.05737v1 |
https://arxiv.org/pdf/1908.05737v1.pdf | |
PWC | https://paperswithcode.com/paper/applications-of-linear-defeasible-logic |
Repo | |
Framework | |
Enhanced Transfer Learning with ImageNet Trained Classification Layer
Title | Enhanced Transfer Learning with ImageNet Trained Classification Layer |
Authors | Tasfia Shermin, Shyh Wei Teng, Manzur Murshed, Guojun Lu, Ferdous Sohel, Manoranjan Paul |
Abstract | Parameter fine tuning is a transfer learning approach whereby learned parameters from pre-trained source network are transferred to the target network followed by fine-tuning. Prior research has shown that this approach is capable of improving task performance. However, the impact of the ImageNet pre-trained classification layer in parameter fine-tuning is mostly unexplored in the literature. In this paper, we propose a fine-tuning approach with the pre-trained classification layer. We employ layer-wise fine-tuning to determine which layers should be frozen for optimal performance. Our empirical analysis demonstrates that the proposed fine-tuning performs better than traditional fine-tuning. This finding indicates that the pre-trained classification layer holds less category-specific or more global information than believed earlier. Thus, we hypothesize that the presence of this layer is crucial for growing network depth to adapt better to a new task. Our study manifests that careful normalization and scaling are essential for creating harmony between the pre-trained and new layers for target domain adaptation. We evaluate the proposed depth augmented networks for fine-tuning on several challenging benchmark datasets and show that they can achieve higher classification accuracy than contemporary transfer learning approaches. |
Tasks | Domain Adaptation, Transfer Learning |
Published | 2019-03-25 |
URL | https://arxiv.org/abs/1903.10150v2 |
https://arxiv.org/pdf/1903.10150v2.pdf | |
PWC | https://paperswithcode.com/paper/depth-augmented-networks-with-optimal-fine |
Repo | |
Framework | |
MoET: Interpretable and Verifiable Reinforcement Learning via Mixture of Expert Trees
Title | MoET: Interpretable and Verifiable Reinforcement Learning via Mixture of Expert Trees |
Authors | Marko Vasic, Andrija Petrovic, Kaiyuan Wang, Mladen Nikolic, Rishabh Singh, Sarfraz Khurshid |
Abstract | Deep Reinforcement Learning (DRL) has led to many recent breakthroughs on complex control tasks, such as defeating the best human player in the game of Go. However, decisions made by the DRL agent are not explainable, hindering its applicability in safety-critical settings. Viper, a recently proposed technique, constructs a decision tree policy by mimicking the DRL agent. Decision trees are interpretable as each action made can be traced back to the decision rule path that lead to it. However, one global decision tree approximating the DRL policy has significant limitations with respect to the geometry of decision boundaries. We propose MoET, a more expressive, yet still interpretable model based on Mixture of Experts, consisting of a gating function that partitions the state space, and multiple decision tree experts that specialize on different partitions. We propose a training procedure to support non-differentiable decision tree experts and integrate it into imitation learning procedure of Viper. We evaluate our algorithm on four OpenAI gym environments, and show that the policy constructed in such a way is more performant and better mimics the DRL agent by lowering mispredictions and increasing the reward. We also show that MoET policies are amenable for verification using off-the-shelf automated theorem provers such as Z3. |
Tasks | Game of Go, Imitation Learning |
Published | 2019-06-16 |
URL | https://arxiv.org/abs/1906.06717v2 |
https://arxiv.org/pdf/1906.06717v2.pdf | |
PWC | https://paperswithcode.com/paper/moet-interpretable-and-verifiable |
Repo | |
Framework | |
Achieving Robustness to Aleatoric Uncertainty with Heteroscedastic Bayesian Optimisation
Title | Achieving Robustness to Aleatoric Uncertainty with Heteroscedastic Bayesian Optimisation |
Authors | Ryan-Rhys Griffiths, Miguel Garcia-Ortegon, Alexander A. Aldrick, Alpha A. Lee |
Abstract | Bayesian optimisation is an important decision-making tool for high-stakes applications in drug discovery and materials design. An oft-overlooked modelling consideration however is the representation of input-dependent or heteroscedastic aleatoric uncertainty. The cost of misrepresenting this uncertainty as being homoscedastic could be high in drug discovery applications where neglecting heteroscedasticity in high throughput virtual screening could lead to a failed drug discovery program. In this paper, we propose a heteroscedastic Bayesian optimisation scheme which both represents and penalises aleatoric noise in the suggestions.Our scheme features a heteroscedastic Gaussian Process (GP) as the surrogate model in conjunction with two acquisition heuristics. First, we extend the augmented expected improvement (AEI) heuristic to the heteroscedastic setting and second, we introduce a new acquisition function, aleatoric-penalised expected improvement (ANPEI) based on a simple scalarisation of the performance and noise objective. Both methods penalise aleatoric noise in the suggestions and yield improved performance relative to a naive implementation of homoscedastic Bayesian optimisation on toy problems as well as a real-world optimisation problem. |
Tasks | Bayesian Optimisation, Decision Making, Drug Discovery |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.07779v1 |
https://arxiv.org/pdf/1910.07779v1.pdf | |
PWC | https://paperswithcode.com/paper/achieving-robustness-to-aleatoric-uncertainty |
Repo | |
Framework | |
Machine learning applications in time series hierarchical forecasting
Title | Machine learning applications in time series hierarchical forecasting |
Authors | Mahdi Abolghasemi, Rob J Hyndman, Garth Tarr, Christoph Bergmeir |
Abstract | Hierarchical forecasting (HF) is needed in many situations in the supply chain (SC) because managers often need different levels of forecasts at different levels of SC to make a decision. Top-Down (TD), Bottom-Up (BU) and Optimal Combination (COM) are common HF models. These approaches are static and often ignore the dynamics of the series while disaggregating them. Consequently, they may fail to perform well if the investigated group of time series are subject to large changes such as during the periods of promotional sales. We address the HF problem of predicting real-world sales time series that are highly impacted by promotion. We use three machine learning (ML) models to capture sales variations over time. Artificial neural networks (ANN), extreme gradient boosting (XGboost), and support vector regression (SVR) algorithms are used to estimate the proportions of lower-level time series from the upper level. We perform an in-depth analysis of 61 groups of time series with different volatilities and show that ML models are competitive and outperform some well-established models in the literature. |
Tasks | Time Series |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00370v1 |
https://arxiv.org/pdf/1912.00370v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-applications-in-time-series |
Repo | |
Framework | |
Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization
Title | Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization |
Authors | Baojian Zhou, Feng Chen, Yiming Ying |
Abstract | Stochastic optimization algorithms update models with cheap per-iteration costs sequentially, which makes them amenable for large-scale data analysis. Such algorithms have been widely studied for structured sparse models where the sparsity information is very specific, e.g., convex sparsity-inducing norms or $\ell^0$-norm. However, these norms cannot be directly applied to the problem of complex (non-convex) graph-structured sparsity models, which have important application in disease outbreak and social networks, etc. In this paper, we propose a stochastic gradient-based method for solving graph-structured sparsity constraint problems, not restricted to the least square loss. We prove that our algorithm enjoys a linear convergence up to a constant error, which is competitive with the counterparts in the batch learning setting. We conduct extensive experiments to show the efficiency and effectiveness of the proposed algorithms. |
Tasks | Stochastic Optimization |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03652v1 |
https://arxiv.org/pdf/1905.03652v1.pdf | |
PWC | https://paperswithcode.com/paper/190503652 |
Repo | |
Framework | |
SEntNet: Source-aware Recurrent Entity Network for Dialogue Response Selection
Title | SEntNet: Source-aware Recurrent Entity Network for Dialogue Response Selection |
Authors | Jiahuan Pei, Arent Stienstra, Julia Kiseleva, Maarten de Rijke |
Abstract | Dialogue response selection is an important part of Task-oriented Dialogue Systems (TDSs); it aims to predict an appropriate response given a dialogue context. Obtaining key information from a complex, long dialogue context is challenging, especially when different sources of information are available, e.g., the user’s utterances, the system’s responses, and results retrieved from a knowledge base (KB). Previous work ignores the type of information source and merges sources for response selection. However, accounting for the source type may lead to remarkable differences in the quality of response selection. We propose the Source-aware Recurrent Entity Network (SEntNet), which is aware of different information sources for the response selection process. SEntNet achieves this by employing source-specific memories to exploit differences in the usage of words and syntactic structure from different information sources (user, system, and KB). Experimental results show that SEntNet obtains 91.0% accuracy on the Dialog bAbI dataset, outperforming prior work by 4.7%. On the DSTC2 dataset, SEntNet obtains an accuracy of 41.2%, beating source unaware recurrent entity networks by 2.4%. |
Tasks | Task-Oriented Dialogue Systems |
Published | 2019-06-16 |
URL | https://arxiv.org/abs/1906.06788v4 |
https://arxiv.org/pdf/1906.06788v4.pdf | |
PWC | https://paperswithcode.com/paper/sentnet-source-aware-recurrent-entity-network |
Repo | |
Framework | |
Multi-Scale Attention Network for Crowd Counting
Title | Multi-Scale Attention Network for Crowd Counting |
Authors | Rahul Rama Varior, Bing Shuai, Joseph Tighe, Davide Modolo |
Abstract | In crowd counting datasets, people appear at different scales, depending on their distance from the camera. To address this issue, we propose a novel multi-branch scale-aware attention network that exploits the hierarchical structure of convolutional neural networks and generates, in a single forward pass, multi-scale density predictions from different layers of the architecture. To aggregate these maps into our final prediction, we present a new soft attention mechanism that learns a set of gating masks. Furthermore, we introduce a scale-aware loss function to regularize the training of different branches and guide them to specialize on a particular scale. As this new training requires annotations for the size of each head, we also propose a simple, yet effective technique to estimate them automatically. Finally, we present an ablation study on each of these components and compare our approach against the literature on 4 crowd counting datasets: UCF-QNRF, ShanghaiTech A & B and UCF_CC_50. Our approach achieves state-of-the-art on all them with a remarkable improvement on UCF-QNRF (+25% reduction in error). |
Tasks | Crowd Counting |
Published | 2019-01-17 |
URL | https://arxiv.org/abs/1901.06026v3 |
https://arxiv.org/pdf/1901.06026v3.pdf | |
PWC | https://paperswithcode.com/paper/scale-aware-attention-network-for-crowd |
Repo | |
Framework | |
Deep Learning Acceleration Techniques for Real Time Mobile Vision Applications
Title | Deep Learning Acceleration Techniques for Real Time Mobile Vision Applications |
Authors | Gael Kamdem De Teyou |
Abstract | Deep Learning (DL) has become a crucial technology for Artificial Intelligence (AI). It is a powerful technique to automatically extract high-level features from complex data which can be exploited for applications such as computer vision, natural language processing, cybersecurity, communications, and so on. For the particular case of computer vision, several algorithms like object detection in real time videos have been proposed and they work well on Desktop GPUs and distributed computing platforms. However these algorithms are still heavy for mobile and embedded visual applications. The rapid spreading of smart portable devices and the emerging 5G network are introducing new smart multimedia applications in mobile environments. As a consequence, the possibility of implementing deep neural networks to mobile environments has attracted a lot of researchers. This paper presents emerging deep learning acceleration techniques that can enable the delivery of real time visual recognition into the hands of end users, anytime and anywhere. |
Tasks | Object Detection |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03418v2 |
https://arxiv.org/pdf/1905.03418v2.pdf | |
PWC | https://paperswithcode.com/paper/190503418 |
Repo | |
Framework | |