Paper Group ANR 361
SANVis: Visual Analytics for Understanding Self-Attention Networks. Compositional Generalization in Image Captioning. Causality and Bayesian network PDEs for multiscale representations of porous media. Attention-based Fusion for Outfit Recommendation. Bilinear Bandits with Low-rank Structure. Transfer Reward Learning for Policy Gradient-Based Text …
SANVis: Visual Analytics for Understanding Self-Attention Networks
Title | SANVis: Visual Analytics for Understanding Self-Attention Networks |
Authors | Cheonbok Park, Inyoup Na, Yongjang Jo, Sungbok Shin, Jaehyo Yoo, Bum Chul Kwon, Jian Zhao, Hyungjong Noh, Yeonsoo Lee, Jaegul Choo |
Abstract | Attention networks, a deep neural network architecture inspired by humans’ attention mechanism, have seen significant success in image captioning, machine translation, and many other applications. Recently, they have been further evolved into an advanced approach called multi-head self-attention networks, which can encode a set of input vectors, e.g., word vectors in a sentence, into another set of vectors. Such encoding aims at simultaneously capturing diverse syntactic and semantic features within a set, each of which corresponds to a particular attention head, forming altogether multi-head attention. Meanwhile, the increased model complexity prevents users from easily understanding and manipulating the inner workings of models. To tackle the challenges, we present a visual analytics system called SANVis, which helps users understand the behaviors and the characteristics of multi-head self-attention networks. Using a state-of-the-art self-attention model called Transformer, we demonstrate usage scenarios of SANVis in machine translation tasks. Our system is available at http://short.sanvis.org |
Tasks | Image Captioning, Machine Translation |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.09595v1 |
https://arxiv.org/pdf/1909.09595v1.pdf | |
PWC | https://paperswithcode.com/paper/sanvis-visual-analytics-for-understanding |
Repo | |
Framework | |
Compositional Generalization in Image Captioning
Title | Compositional Generalization in Image Captioning |
Authors | Mitja Nikolaus, Mostafa Abdou, Matthew Lamm, Rahul Aralikatte, Desmond Elliott |
Abstract | Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image–sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models. |
Tasks | Image Captioning |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04402v2 |
https://arxiv.org/pdf/1909.04402v2.pdf | |
PWC | https://paperswithcode.com/paper/compositional-generalization-in-image |
Repo | |
Framework | |
Causality and Bayesian network PDEs for multiscale representations of porous media
Title | Causality and Bayesian network PDEs for multiscale representations of porous media |
Authors | Kimoon Um, Eric Joseph Hall, Markos A. Katsoulakis, Daniel M. Tartakovsky |
Abstract | Microscopic (pore-scale) properties of porous media affect and often determine their macroscopic (continuum- or Darcy-scale) counterparts. Understanding the relationship between processes on these two scales is essential to both the derivation of macroscopic models of, e.g., transport phenomena in natural porous media, and the design of novel materials, e.g., for energy storage. Most microscopic properties exhibit complex statistical correlations and geometric constraints, which presents challenges for the estimation of macroscopic quantities of interest (QoIs), e.g., in the context of global sensitivity analysis (GSA) of macroscopic QoIs with respect to microscopic material properties. We present a systematic way of building correlations into stochastic multiscale models through Bayesian networks. This allows us to construct the joint probability density function (PDF) of model parameters through causal relationships that emulate engineering processes, e.g., the design of hierarchical nanoporous materials. Such PDFs also serve as input for the forward propagation of parametric uncertainty; our findings indicate that the inclusion of causal relationships impacts predictions of macroscopic QoIs. To assess the impact of correlations and causal relationships between microscopic parameters on macroscopic material properties, we use a moment-independent GSA based on the differential mutual information. Our GSA accounts for the correlated inputs and complex non-Gaussian QoIs. The global sensitivity indices are used to rank the effect of uncertainty in microscopic parameters on macroscopic QoIs, to quantify the impact of causality on the multiscale model’s predictions, and to provide physical interpretations of these results for hierarchical nanoporous materials. |
Tasks | |
Published | 2019-01-06 |
URL | http://arxiv.org/abs/1901.01604v1 |
http://arxiv.org/pdf/1901.01604v1.pdf | |
PWC | https://paperswithcode.com/paper/causality-and-bayesian-network-pdes-for |
Repo | |
Framework | |
Attention-based Fusion for Outfit Recommendation
Title | Attention-based Fusion for Outfit Recommendation |
Authors | Katrien Laenen, Marie-Francine Moens |
Abstract | This paper describes an attention-based fusion method for outfit recommendation which fuses the information in the product image and description to capture the most important, fine-grained product features into the item representation. We experiment with different kinds of attention mechanisms and demonstrate that the attention-based fusion improves item understanding. We outperform state-of-the-art outfit recommendation results on three benchmark datasets. |
Tasks | |
Published | 2019-08-28 |
URL | https://arxiv.org/abs/1908.10585v1 |
https://arxiv.org/pdf/1908.10585v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-fusion-for-outfit |
Repo | |
Framework | |
Bilinear Bandits with Low-rank Structure
Title | Bilinear Bandits with Low-rank Structure |
Authors | Kwang-Sung Jun, Rebecca Willett, Stephen Wright, Robert Nowak |
Abstract | We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms. The unknown in the problem is a $d_1$ by $d_2$ matrix $\mathbf{\Theta}^$ that defines the reward, and has low rank $r \ll \min{d_1,d_2}$. Determination of $\mathbf{\Theta}^$ with this low-rank structure poses a significant challenge in finding the right exploration-exploitation tradeoff. In this work, we propose a new two-stage algorithm called “Explore-Subspace-Then-Refine” (ESTR). The first stage is an explicit subspace exploration, while the second stage is a linear bandit algorithm called “almost-low-dimensional OFUL” (LowOFUL) that exploits and further refines the estimated subspace via a regularization technique. We show that the regret of ESTR is $\widetilde{\mathcal{O}}((d_1+d_2)^{3/2} \sqrt{r T})$ where $\widetilde{\mathcal{O}}$ hides logarithmic factors and $T$ is the time horizon, which improves upon the regret of $\widetilde{\mathcal{O}}(d_1d_2\sqrt{T})$ attained for a na"ive linear bandit reduction. We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and our preliminary experiment shows that ESTR outperforms a na"ive linear bandit reduction. |
Tasks | |
Published | 2019-01-08 |
URL | https://arxiv.org/abs/1901.02470v2 |
https://arxiv.org/pdf/1901.02470v2.pdf | |
PWC | https://paperswithcode.com/paper/bilinear-bandits-with-low-rank-structure |
Repo | |
Framework | |
Transfer Reward Learning for Policy Gradient-Based Text Generation
Title | Transfer Reward Learning for Policy Gradient-Based Text Generation |
Authors | James O’ Neill, Danushka Bollegala |
Abstract | Task-specific scores are often used to optimize for and evaluate the performance of conditional text generation systems. However, such scores are non-differentiable and cannot be used in the standard supervised learning paradigm. Hence, policy gradient methods are used since the gradient can be computed without requiring a differentiable objective. However, we argue that current n-gram overlap based measures that are used as rewards can be improved by using model-based rewards transferred from tasks that directly compare the similarity of sentence pairs. These reward models either output a score of sentence-level syntactic and semantic similarity between entire predicted and target sentences as the expected return, or for intermediate phrases as segmented accumulative rewards. We demonstrate that using a \textit{Transferable Reward Learner} leads to improved results on semantical evaluation measures in policy-gradient models for image captioning tasks. Our InferSent actor-critic model improves over a BLEU trained actor-critic model on MSCOCO when evaluated on a Word Mover’s Distance similarity measure by 6.97 points, also improving on a Sliding Window Cosine Similarity measure by 10.48 points. Similar performance improvements are also obtained on the smaller Flickr-30k dataset, demonstrating the general applicability of the proposed transfer learning method. |
Tasks | Image Captioning, Policy Gradient Methods, Semantic Similarity, Semantic Textual Similarity, Text Generation, Transfer Learning |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.03622v1 |
https://arxiv.org/pdf/1909.03622v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-reward-learning-for-policy-gradient |
Repo | |
Framework | |
Deep Learning Based Energy Disaggregation and On/Off Detection of Household Appliances
Title | Deep Learning Based Energy Disaggregation and On/Off Detection of Household Appliances |
Authors | Jie Jiang, Qiuqiang Kong, Mark Plumbley, Nigel Gilbert |
Abstract | Energy disaggregation, a.k.a. Non-Intrusive Load Monitoring, aims to separate the energy consumption of individual appliances from the readings of a mains power meter measuring the total energy consumption of, e.g. a whole house. Energy consumption of individual appliances can be useful in many applications, e.g., providing appliance-level feedback to the end users to help them understand their energy consumption and ultimately save energy. Recently, with the availability of large-scale energy consumption datasets, various neural network models such as convolutional neural networks and recurrent neural networks have been investigated to solve the energy disaggregation problem. Neural network models can learn complex patterns from large amounts of data and have been shown to outperform the traditional machine learning methods such as variants of hidden Markov models. However, current neural network methods for energy disaggregation are either computational expensive or are not capable of handling long-term dependencies. In this paper, we investigate the application of the recently developed WaveNet models for the task of energy disaggregation. Based on a real-world energy dataset collected from 20 households over two years, we show that WaveNet models outperforms the state-of-the-art deep learning methods proposed in the literature for energy disaggregation in terms of both error measures and computational cost. On the basis of energy disaggregation, we then investigate the performance of two deep-learning based frameworks for the task of on/off detection which aims at estimating whether an appliance is in operation or not. Based on the same dataset, we show that for the task of on/off detection the second framework, i.e., directly training a binary classifier, achieves better performance in terms of F1 score. |
Tasks | Non-Intrusive Load Monitoring |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1908.00941v2 |
https://arxiv.org/pdf/1908.00941v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-energy-disaggregation-and |
Repo | |
Framework | |
Logic Conditionals, Supervenience, and Selection Tasks
Title | Logic Conditionals, Supervenience, and Selection Tasks |
Authors | Giovanni Sileno |
Abstract | Principles of cognitive economy would require that concepts about objects, properties and relations should be introduced only if they simplify the conceptualisation of a domain. Unexpectedly, classic logic conditionals, specifying structures holding within elements of a formal conceptualisation, do not always satisfy this crucial principle. The paper argues that this requirement is captured by supervenience, hereby further identified as a property necessary for compression. The resulting theory suggests an alternative explanation of the empirical experiences observable in Wason’s selection tasks, associating human performance with conditionals on the ability of dealing with compression, rather than with logic necessity. |
Tasks | |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06773v2 |
https://arxiv.org/pdf/1907.06773v2.pdf | |
PWC | https://paperswithcode.com/paper/logic-conditionals-supervenience-and |
Repo | |
Framework | |
Optimal short-term memory before the edge of chaos in driven random recurrent networks
Title | Optimal short-term memory before the edge of chaos in driven random recurrent networks |
Authors | Taichi Haruna, Kohei Nakajima |
Abstract | The ability of discrete-time nonlinear recurrent neural networks to store time-varying small input signals is investigated by mean-field theory. The combination of a small input strength and mean-field assumptions makes it possible to derive an approximate expression for the conditional probability density of the state of a neuron given a past input signal. From this conditional probability density, we can analytically calculate short-term memory measures, such as memory capacity, mutual information, and Fisher information, and determine the relationships among these measures, which have not been clarified to date to the best of our knowledge. We show that the network contribution of these short-term memory measures peaks before the edge of chaos, where the dynamics of input-driven networks is stable but corresponding systems without input signals are unstable. |
Tasks | |
Published | 2019-12-24 |
URL | https://arxiv.org/abs/1912.11213v1 |
https://arxiv.org/pdf/1912.11213v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-short-term-memory-before-the-edge-of |
Repo | |
Framework | |
Structural modeling using overlapped group penalties for discovering predictive biomarkers for subgroup analysis
Title | Structural modeling using overlapped group penalties for discovering predictive biomarkers for subgroup analysis |
Authors | Chong Ma, Wenxuan Deng, Shuangge Ma, Ray Liu, Kevin Galinsky |
Abstract | The identification of predictive biomarkers from a large scale of covariates for subgroup analysis has attracted fundamental attention in medical research. In this article, we propose a generalized penalized regression method with a novel penalty function, for enforcing the hierarchy structure between the prognostic and predictive effects, such that a nonzero predictive effect must induce its ancestor prognostic effects being nonzero in the model. Our method is able to select useful predictive biomarkers by yielding a sparse, interpretable, and predictable model for subgroup analysis, and can deal with different types of response variable such as continuous, categorical, and time-to-event data. We show that our method is asymptotically consistent under some regularized conditions. To minimize the generalized penalized regression model, we propose a novel integrative optimization algorithm by integrating the majorization-minimization and the alternating direction method of multipliers, which is named after \texttt{smog}. The enriched simulation study and real case study demonstrate that our method is very powerful for discovering the true predictive biomarkers and identifying subgroups of patients. |
Tasks | |
Published | 2019-04-26 |
URL | http://arxiv.org/abs/1904.11648v1 |
http://arxiv.org/pdf/1904.11648v1.pdf | |
PWC | https://paperswithcode.com/paper/structural-modeling-using-overlapped-group |
Repo | |
Framework | |
Improving Adversarial Robustness of Ensembles with Diversity Training
Title | Improving Adversarial Robustness of Ensembles with Diversity Training |
Authors | Sanjay Kariyappa, Moinuddin K. Qureshi |
Abstract | Deep Neural Networks are vulnerable to adversarial attacks even in settings where the attacker has no direct access to the model being attacked. Such attacks usually rely on the principle of transferability, whereby an attack crafted on a surrogate model tends to transfer to the target model. We show that an ensemble of models with misaligned loss gradients can provide an effective defense against transfer-based attacks. Our key insight is that an adversarial example is less likely to fool multiple models in the ensemble if their loss functions do not increase in a correlated fashion. To this end, we propose Diversity Training, a novel method to train an ensemble of models with uncorrelated loss functions. We show that our method significantly improves the adversarial robustness of ensembles and can also be combined with existing methods to create a stronger defense. |
Tasks | |
Published | 2019-01-28 |
URL | http://arxiv.org/abs/1901.09981v1 |
http://arxiv.org/pdf/1901.09981v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-adversarial-robustness-of-ensembles |
Repo | |
Framework | |
Depth from Small Motion using Rank-1 Initialization
Title | Depth from Small Motion using Rank-1 Initialization |
Authors | Peter O. Fasogbon |
Abstract | Depth from Small Motion (DfSM) (Ha et al., 2016) is particularly interesting for commercial handheld devices because it allows the possibility to get depth information with minimal user effort and cooperation. Due to speed and memory issue on these devices, the self calibration optimization of the method using Bundle Adjustment (BA) need as little as 10-15 images. Therefore, the optimization tends to take many iterations to converge or may not converge at all in some cases. This work propose a robust initialization for the bundle adjustment using the rank-1 factorization method (Tomasi and Kanade, 1992), (Aguiar and Moura, 1999a). We create a constraint matrix that is rank-1 in a noiseless situation, then use SVD to compute the inverse depth values and the camera motion. We only need about quarter fraction of the bundle adjustment iteration to converge. We also propose grided feature extraction technique so that only important and small features are tracked all over the image frames. This also ensure speedup in the full execution time on the mobile device. For the experiments, we have documented the execution time with the proposed Rank-1 initialization on two mobile device platforms using optimized accelerations with CPU-GPU co-processing. The combination of Rank 1-BA generates more robust depth-map and is significantly faster than using BA alone. |
Tasks | Calibration |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04058v1 |
https://arxiv.org/pdf/1907.04058v1.pdf | |
PWC | https://paperswithcode.com/paper/depth-from-small-motion-using-rank-1 |
Repo | |
Framework | |
Bilingual is At Least Monolingual (BALM): A Novel Translation Algorithm that Encodes Monolingual Priors
Title | Bilingual is At Least Monolingual (BALM): A Novel Translation Algorithm that Encodes Monolingual Priors |
Authors | Jeffrey Cheng, Chris Callison-Burch |
Abstract | State-of-the-art machine translation (MT) models do not use knowledge of any single language’s structure; this is the equivalent of asking someone to translate from English to German while knowing neither language. BALM is a framework incorporates monolingual priors into an MT pipeline; by casting input and output languages into embedded space using BERT, we can solve machine translation with much simpler models. We find that English-to-German translation on the Multi30k dataset can be solved with a simple feedforward network under the BALM framework with near-SOTA BLEU scores. |
Tasks | Machine Translation |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1909.01146v1 |
https://arxiv.org/pdf/1909.01146v1.pdf | |
PWC | https://paperswithcode.com/paper/bilingual-is-at-least-monolingual-balm-a |
Repo | |
Framework | |
Quantum-Assisted Clustering Algorithms for NISQ-Era Devices
Title | Quantum-Assisted Clustering Algorithms for NISQ-Era Devices |
Authors | Samuel S. Mendelson, Robert W. Strand, Guy B. Oldaker IV, Jacob M. Farinholt |
Abstract | In the NISQ-era of quantum computing, we should not expect to see quantum devices that provide an exponential improvement in runtime for practical problems, due to the lack of error correction and small number of qubits available. Nevertheless, these devices should be able to provide other performance improvements, particularly when combined with existing classical machines. In this article, we develop several hybrid quantum-classical clustering algorithms that can be employed as subroutines on small, NISQ-era devices. These new hybrid algorithms require a number of qubits that is at most logarithmic in the size of the data, provide performance improvement and/or runtime improvement over their classical counterparts, and do not require a black-box oracle. Consequently, we are able to provide a promising near-term application of NISQ-era devices. |
Tasks | |
Published | 2019-04-18 |
URL | https://arxiv.org/abs/1904.08992v3 |
https://arxiv.org/pdf/1904.08992v3.pdf | |
PWC | https://paperswithcode.com/paper/quantum-assisted-clustering-algorithms-for |
Repo | |
Framework | |
A game method for improving the interpretability of convolution neural network
Title | A game method for improving the interpretability of convolution neural network |
Authors | Jinwei Zhao, Qizhou Wang, Fuqiang Zhang, Wanli Qiu, Yufei Wang, Yu Liu, Guo Xie, Weigang Ma, Bin Wang, Xinhong Hei |
Abstract | Real artificial intelligence always has been focused on by many machine learning researchers, especially in the area of deep learning. However deep neural network is hard to be understood and explained, and sometimes, even metaphysics. The reason is, we believe that: the network is essentially a perceptual model. Therefore, we believe that in order to complete complex intelligent activities from simple perception, it is necessary to con-struct another interpretable logical network to form accurate and reasonable responses and explanations to external things. Researchers like Bolei Zhou and Quanshi Zhang have found many explanatory rules for deep feature extraction aimed at the feature extraction stage of convolution neural network. However, although researchers like Marco Gori have also made great efforts to improve the interpretability of the fully connected layers of the network, the problem is also very difficult. This paper firstly analyzes its reason. Then a method of constructing logical network based on the fully connected layers and extracting logical relation between input and output of the layers is proposed. The game process between perceptual learning and logical abstract cognitive learning is implemented to improve the interpretable performance of deep learning process and deep learning model. The benefits of our approach are illustrated on benchmark data sets and in real-world experiments. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09090v1 |
https://arxiv.org/pdf/1910.09090v1.pdf | |
PWC | https://paperswithcode.com/paper/a-game-method-for-improving-the |
Repo | |
Framework | |