Paper Group ANR 1561
Online Matrix Completion with Side Information. Weighted Clustering Ensemble: A Review. Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks. Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering. Responsible Scoring Mechanisms Through Function Sampling. Classification of …
Online Matrix Completion with Side Information
Title | Online Matrix Completion with Side Information |
Authors | Mark Herbster, Stephen Pasteris, Lisa Tse |
Abstract | We give an online algorithm and prove novel mistake and regret bounds for online binary matrix completion with side information. The bounds we prove are of the form $\tilde{\mathcal{O}}({\mathcal{D}}/{\gamma^2})$. The term ${1}/{\gamma^2}$ is analogous to the usual margin term in SVM (perceptron) bounds. More specifically, if we assume that there is some factorization of the underlying $m\times n$ matrix into $\mathbf{P} \mathbf{Q}^{\top}$ where the rows of $\mathbf{P}$ are interpreted as “classifiers” in $\Re^d$ and the rows of $\mathbf{Q}$ as “instances” in $\Re^d$, then $\gamma$ is is the maximum (normalized) margin over all factorizations $\mathbf{P} \mathbf{Q}^{\top}$ consistent with the observed matrix. The quasi-dimension term $\mathcal{D}$ measures the quality of side information. In the presence of no side information, $\mathcal{D} = m+n$. However, if the side information is predictive of the underlying factorization of the matrix, then in the best case, $\mathcal{D} \in \mathcal{O}(k + \ell)$ where $k$ is the number of distinct row factors and $\ell$ is the number of distinct column factors. We additionally provide a generalization of our algorithm to the inductive setting. In this setting, the side information is not specified in advance. The results are similar to the transductive setting but in the best case, the quasi-dimension $\mathcal{D}$ is now bounded by $\mathcal{O}(k^2 + \ell^2)$. |
Tasks | Matrix Completion |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07255v2 |
https://arxiv.org/pdf/1906.07255v2.pdf | |
PWC | https://paperswithcode.com/paper/online-matrix-completion-with-side |
Repo | |
Framework | |
Weighted Clustering Ensemble: A Review
Title | Weighted Clustering Ensemble: A Review |
Authors | Mimi Zhang |
Abstract | Clustering ensemble has emerged as a powerful tool for improving both the robustness and the stability of results from individual clustering methods. Weighted clustering ensemble arises naturally from clustering ensemble. One of the arguments for weighted clustering ensemble is that elements (clusterings or clusters) in a clustering ensemble are of different quality, or that objects or features are of varying significance. However, it is not possible to directly apply the weighting mechanisms from classification (supervised) domain to clustering (unsupervised) domain, also because clustering is inherently an ill-posed problem. This paper provides an overview of weighted clustering ensemble by discussing different types of weights, major approaches to determining weight values, and applications of weighted clustering ensemble to complex data. The unifying framework presented in this paper will help clustering practitioners select the most appropriate weighting mechanisms for their own problems. |
Tasks | |
Published | 2019-10-06 |
URL | https://arxiv.org/abs/1910.02433v1 |
https://arxiv.org/pdf/1910.02433v1.pdf | |
PWC | https://paperswithcode.com/paper/weighted-clustering-ensemble-a-review |
Repo | |
Framework | |
Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks
Title | Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks |
Authors | Binxuan Huang, Kathleen M. Carley |
Abstract | Aspect level sentiment classification aims to identify the sentiment expressed towards an aspect given a context sentence. Previous neural network based methods largely ignore the syntax structure in one sentence. In this paper, we propose a novel target-dependent graph attention network (TD-GAT) for aspect level sentiment classification, which explicitly utilizes the dependency relationship among words. Using the dependency graph, it propagates sentiment features directly from the syntactic context of an aspect target. In our experiments, we show our method outperforms multiple baselines with GloVe embeddings. We also demonstrate that using BERT representations further substantially boosts the performance. |
Tasks | Sentiment Analysis |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02606v1 |
https://arxiv.org/pdf/1909.02606v1.pdf | |
PWC | https://paperswithcode.com/paper/syntax-aware-aspect-level-sentiment |
Repo | |
Framework | |
Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering
Title | Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering |
Authors | Jinhyuk Lee, Minjoon Seo, Hannaneh Hajishirzi, Jaewoo Kang |
Abstract | A sparse representation is known to be an effective means to encode precise lexical cues in information retrieval tasks by associating each dimension with a unique n-gram-based feature. However, it has often relied on term frequency (such as tf-idf and BM25) or hand-engineered features that are coarse-grained (document-level) and often task-specific, hence not easily generalizable and not appropriate for fine-grained (word or phrase-level) retrieval. In this work, we propose an effective method for learning a highly contextualized, word-level sparse representation by utilizing rectified self-attention weights on the neighboring n-grams. We kernelize the inner product space during training for memory efficiency without the explicit mapping of the large sparse vectors. We particularly focus on the application of our model to phrase retrieval problem, which has recently shown to be a promising direction for open-domain question answering (QA) and requires lexically sensitive phrase encoding. We demonstrate the effectiveness of the learned sparse representations by not only drastically improving the phrase retrieval accuracy (by more than 4%), but also outperforming all other (pipeline-based) open-domain QA methods with up to x97 inference in SQuADopen and CuratedTrec. |
Tasks | Information Retrieval, Open-Domain Question Answering, Question Answering |
Published | 2019-11-07 |
URL | https://arxiv.org/abs/1911.02896v1 |
https://arxiv.org/pdf/1911.02896v1.pdf | |
PWC | https://paperswithcode.com/paper/contextualized-sparse-representation-with-1 |
Repo | |
Framework | |
Responsible Scoring Mechanisms Through Function Sampling
Title | Responsible Scoring Mechanisms Through Function Sampling |
Authors | Abolfazl Asudeh, H. V. Jagadish |
Abstract | Human decision-makers often receive assistance from data-driven algorithmic systems that provide a score for evaluating objects, including individuals. The scores are generated by a function (mechanism) that takes a set of features as input and generates a score.The scoring functions are either machine-learned or human-designed and can be used for different decision purposes such as ranking or classification. Given the potential impact of these scoring mechanisms on individuals’ lives and on society, it is important to make sure these scores are computed responsibly. Hence we need tools for responsible scoring mechanism design. In this paper, focusing on linear scoring functions, we highlight the importance of unbiased function sampling and perturbation in the function space for devising such tools. We provide unbiased samplers for the entire function space, as well as a $\theta$-vicinity around a given function. We then illustrate the value of these samplers for designing effective algorithms in three diverse problem scenarios in the context of ranking. Finally, as a fundamental method for designing responsible scoring mechanisms, we propose a novel approach for approximating the construction of the arrangement of hyperplanes. Despite the exponential complexity of an arrangement in the number of dimensions, using function sampling, our algorithm is linear in the number of samples and hyperplanes, and independent of the number of dimensions. |
Tasks | |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10073v1 |
https://arxiv.org/pdf/1911.10073v1.pdf | |
PWC | https://paperswithcode.com/paper/responsible-scoring-mechanisms-through |
Repo | |
Framework | |
Classification of Imbalanced Data with a Geometric Digraph Family
Title | Classification of Imbalanced Data with a Geometric Digraph Family |
Authors | Artür Manukyan, Elvan Ceyhan |
Abstract | We use a geometric digraph family called class cover catch digraphs (CCCDs) to tackle the class imbalance problem in statistical classification. CCCDs provide graph theoretic solutions to the class cover problem and have been employed in classification. We assess the classification performance of CCCD classifiers by extensive Monte Carlo simulations, comparing them with other classifiers commonly used in the literature. In particular, we show that CCCD classifiers perform relatively well when one class is more frequent than the other in a two-class setting, an example of the class imbalance problem. We also point out the relationship between class imbalance and class overlapping problems, and their influence on the performance of CCCD classifiers and other classification methods as well as some state-of-the-art algorithms which are robust to class imbalance by construction. Experiments on both simulated and real data sets indicate that CCCD classifiers are robust to the class imbalance problem. CCCDs substantially undersample from the majority class while preserving the information on the discarded points during the undersampling process. Many state-of-the-art methods, however, keep this information by means of ensemble classifiers, but CCCDs yield only a single classifier with the same property, making it both appealing and fast. |
Tasks | |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04564v1 |
http://arxiv.org/pdf/1904.04564v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-of-imbalanced-data-with-a |
Repo | |
Framework | |
A Regularization Approach for Instance-Based Superset Label Learning
Title | A Regularization Approach for Instance-Based Superset Label Learning |
Authors | Chen Gong, Tongliang Liu, Yuanyan Tang, Jian Yang, Jie Yang, Dacheng Tao |
Abstract | Different from the traditional supervised learning in which each training example has only one explicit label, superset label learning (SLL) refers to the problem that a training example can be associated with a set of candidate labels, and only one of them is correct. Existing SLL methods are either regularization-based or instance-based, and the latter of which has achieved state-of-the-art performance. This is because the latest instance-based methods contain an explicit disambiguation operation that accurately picks up the groundtruth label of each training example from its ambiguous candidate labels. However, such disambiguation operation does not fully consider the mutually exclusive relationship among different candidate labels, so the disambiguated labels are usually generated in a nondiscriminative way, which is unfavorable for the instance-based methods to obtain satisfactory performance. To address this defect, we develop a novel regularization approach for instance-based superset label (RegISL) learning so that our instance-based method also inherits the good discriminative ability possessed by the regularization scheme. Specifically, we employ a graph to represent the training set, and require the examples that are adjacent on the graph to obtain similar labels. More importantly, a discrimination term is proposed to enlarge the gap of values between possible labels and unlikely labels for every training example. As a result, the intrinsic constraints among different candidate labels are deployed, and the disambiguated labels generated by RegISL are more discriminative and accurate than those output by existing instance-based algorithms. The experimental results on various tasks convincingly demonstrate the superiority of our RegISL to other typical SLL methods in terms of both training accuracy and test accuracy. |
Tasks | |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.02832v1 |
http://arxiv.org/pdf/1904.02832v1.pdf | |
PWC | https://paperswithcode.com/paper/a-regularization-approach-for-instance-based |
Repo | |
Framework | |
Amplifying the Imitation Effect for Reinforcement Learning of UCAV’s Mission Execution
Title | Amplifying the Imitation Effect for Reinforcement Learning of UCAV’s Mission Execution |
Authors | Gyeong Taek Lee, Chang Ouk Kim |
Abstract | This paper proposes a new reinforcement learning (RL) algorithm that enhances exploration by amplifying the imitation effect (AIE). This algorithm consists of self-imitation learning and random network distillation algorithms. We argue that these two algorithms complement each other and that combining these two algorithms can amplify the imitation effect for exploration. In addition, by adding an intrinsic penalty reward to the state that the RL agent frequently visits and using replay memory for learning the feature state when using an exploration bonus, the proposed approach leads to deep exploration and deviates from the current converged policy. We verified the exploration performance of the algorithm through experiments in a two-dimensional grid environment. In addition, we applied the algorithm to a simulated environment of unmanned combat aerial vehicle (UCAV) mission execution, and the empirical results show that AIE is very effective for finding the UCAV’s shortest flight path to avoid an enemy’s missiles. |
Tasks | Imitation Learning |
Published | 2019-01-17 |
URL | http://arxiv.org/abs/1901.05856v1 |
http://arxiv.org/pdf/1901.05856v1.pdf | |
PWC | https://paperswithcode.com/paper/amplifying-the-imitation-effect-for |
Repo | |
Framework | |
ROMANet: Fine-Grained Reuse-Driven Data Organization and Off-Chip Memory Access Management for Deep Neural Network Accelerators
Title | ROMANet: Fine-Grained Reuse-Driven Data Organization and Off-Chip Memory Access Management for Deep Neural Network Accelerators |
Authors | Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique |
Abstract | Many hardware accelerators have been proposed to improve the computational efficiency of the inference process in deep neural networks (DNNs). However, off-chip memory accesses, being the most energy consuming operation in such architectures, limit the designs from achieving efficiency gains at the full potential. Towards this, we propose ROMANet, a methodology to investigate efficient dataflow patterns for reducing the number of the off-chip accesses. ROMANet adaptively determine the data reuse patterns for each convolutional layer of a network by considering the reuse factor of weights, input activations, and output activations. It also considers the data mapping inside off-chip memory to reduce row buffer misses and increase parallelism. Our experimental results show that ROMANet methodology is able to achieve up to 50% dynamic energy savings in state-of-the-art DNN accelerators. |
Tasks | |
Published | 2019-02-04 |
URL | http://arxiv.org/abs/1902.10222v1 |
http://arxiv.org/pdf/1902.10222v1.pdf | |
PWC | https://paperswithcode.com/paper/romanet-fine-grained-reuse-driven-data |
Repo | |
Framework | |
Road traffic reservoir computing
Title | Road traffic reservoir computing |
Authors | Hiroyasu Ando, Hanten Chang |
Abstract | Reservoir computing derived from recurrent neural networks is more applicable to real world systems than deep learning because of its low computational cost and potential for physical implementation. Specifically, physical reservoir computing, which replaces the dynamics of reservoir units with physical phenomena, has recently received considerable attention. In this study, we propose a method of exploiting the dynamics of road traffic as a reservoir, and numerically confirm its feasibility by applying several prediction tasks based on a simple mathematical model of the traffic flow. |
Tasks | |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00554v1 |
https://arxiv.org/pdf/1912.00554v1.pdf | |
PWC | https://paperswithcode.com/paper/road-traffic-reservoir-computing |
Repo | |
Framework | |
MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization
Title | MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization |
Authors | Rongcheng Lin, Jing Xiao, Jianping Fan |
Abstract | In this paper, we present and discuss a deep mixture model with online knowledge distillation (MOD) for large-scale video temporal concept localization, which is ranked 3rd in the 3rd YouTube-8M Video Understanding Challenge. Specifically, we find that by enabling knowledge sharing with online distillation, fintuning a mixture model on a smaller dataset can achieve better evaluation performance. Based on this observation, in our final solution, we trained and fintuned 12 NeXtVLAD models in parallel with a 2-layer online distillation structure. The experimental results show that the proposed distillation structure can effectively avoid overfitting and shows superior generalization performance. The code is publicly available at: https://github.com/linrongc/solution_youtube8m_v3 |
Tasks | Video Understanding |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12295v1 |
https://arxiv.org/pdf/1910.12295v1.pdf | |
PWC | https://paperswithcode.com/paper/mod-a-deep-mixture-model-with-online |
Repo | |
Framework | |
Detecting Spiky Corruption in Markov Decision Processes
Title | Detecting Spiky Corruption in Markov Decision Processes |
Authors | Jason Mancuso, Tomasz Kisielewski, David Lindner, Alok Singh |
Abstract | Current reinforcement learning methods fail if the reward function is imperfect, i.e. if the agent observes reward different from what it actually receives. We study this problem within the formalism of Corrupt Reward Markov Decision Processes (CRMDPs). We show that if the reward corruption in a CRMDP is sufficiently “spiky”, the environment is solvable. We fully characterize the regret bound of a Spiky CRMDP, and introduce an algorithm that is able to detect its corrupt states. We show that this algorithm can be used to learn the optimal policy with any common reinforcement learning algorithm. Finally, we investigate our algorithm in a pair of simple gridworld environments, finding that our algorithm can detect the corrupt states and learn the optimal policy despite the corruption. |
Tasks | |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00452v1 |
https://arxiv.org/pdf/1907.00452v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-spiky-corruption-in-markov-decision |
Repo | |
Framework | |
Towards Understanding the Transferability of Deep Representations
Title | Towards Understanding the Transferability of Deep Representations |
Authors | Hong Liu, Mingsheng Long, Jianmin Wang, Michael I. Jordan |
Abstract | Deep neural networks trained on a wide range of datasets demonstrate impressive transferability. Deep features appear general in that they are applicable to many datasets and tasks. Such property is in prevalent use in real-world applications. A neural network pretrained on large datasets, such as ImageNet, can significantly boost generalization and accelerate training if fine-tuned to a smaller target dataset. Despite its pervasiveness, few effort has been devoted to uncovering the reason of transferability in deep feature representations. This paper tries to understand transferability from the perspectives of improved generalization, optimization and the feasibility of transferability. We demonstrate that 1) Transferred models tend to find flatter minima, since their weight matrices stay close to the original flat region of pretrained parameters when transferred to a similar target dataset; 2) Transferred representations make the loss landscape more favorable with improved Lipschitzness, which accelerates and stabilizes training substantially. The improvement largely attributes to the fact that the principal component of gradient is suppressed in the pretrained parameters, thus stabilizing the magnitude of gradient in back-propagation. 3) The feasibility of transferability is related to the similarity of both input and label. And a surprising discovery is that the feasibility is also impacted by the training stages in that the transferability first increases during training, and then declines. We further provide a theoretical analysis to verify our observations. |
Tasks | |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12031v1 |
https://arxiv.org/pdf/1909.12031v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-understanding-the-transferability-of-1 |
Repo | |
Framework | |
Learnt dynamics generalizes across tasks, datasets, and populations
Title | Learnt dynamics generalizes across tasks, datasets, and populations |
Authors | U. Mahmood, M. M. Rahman, A. Fedorov, Z. Fu, V. D. Calhoun, S. M. Plis |
Abstract | Differentiating multivariate dynamic signals is a difficult learning problem as the feature space may be large yet often only a few training examples are available. Traditional approaches to this problem either proceed from handcrafted features or require large datasets to combat the m » n problem. In this paper, we show that the source of the problem—signal dynamics—can be used to our advantage and noticeably improve classification performance on a range of discrimination tasks when training data is scarce. We demonstrate that self-supervised pre-training guided by signal dynamics produces embedding that generalizes across tasks, datasets, data collection sites, and data distributions. We perform an extensive evaluation of this approach on a range of tasks including simulated data, keyword detection problem, and a range of functional neuroimaging data, where we show that a single embedding learnt on healthy subjects generalizes across a number of disorders, age groups, and datasets. |
Tasks | |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.03130v1 |
https://arxiv.org/pdf/1912.03130v1.pdf | |
PWC | https://paperswithcode.com/paper/learnt-dynamics-generalizes-across-tasks |
Repo | |
Framework | |
Optimal Learning of Joint Alignments with a Faulty Oracle
Title | Optimal Learning of Joint Alignments with a Faulty Oracle |
Authors | Kasper Green Larsen, Michael Mitzenmacher, Charalampos E. Tsourakakis |
Abstract | We consider the following problem, which is useful in applications such as joint image and shape alignment. The goal is to recover $n$ discrete variables $g_i \in {0, \ldots, k-1}$ (up to some global offset) given noisy observations of a set of their pairwise differences ${(g_i - g_j) \bmod k}$; specifically, with probability $\frac{1}{k}+\delta$ for some $\delta > 0$ one obtains the correct answer, and with the remaining probability one obtains a uniformly random incorrect answer. We consider a learning-based formulation where one can perform a query to observe a pairwise difference, and the goal is to perform as few queries as possible while obtaining the exact joint alignment. We provide an easy-to-implement, time efficient algorithm that performs $O\big(\frac{n \lg n}{k \delta^2}\big)$ queries, and recovers the joint alignment with high probability. We also show that our algorithm is optimal by proving a general lower bound that holds for all non-adaptive algorithms. Our work improves significantly recent work by Chen and Cand'{e}s \cite{chen2016projected}, who view the problem as a constrained principal components analysis problem that can be solved using the power method. Specifically, our approach is simpler both in the algorithm and the analysis, and provides additional insights into the problem structure. |
Tasks | |
Published | 2019-09-21 |
URL | https://arxiv.org/abs/1909.09912v1 |
https://arxiv.org/pdf/1909.09912v1.pdf | |
PWC | https://paperswithcode.com/paper/190909912 |
Repo | |
Framework | |