Paper Group ANR 444
Progressive Identification of True Labels for Partial-Label Learning. LoCEC: Local Community-based Edge Classification in Large Online Social Networks. Preferential Batch Bayesian Optimization. An Ontology-based Context Model in Intelligent Environments. Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization. A Diffusion Theory for De …
Progressive Identification of True Labels for Partial-Label Learning
Title | Progressive Identification of True Labels for Partial-Label Learning |
Authors | Jiaqi Lv, Miao Xu, Lei Feng, Gang Niu, Xin Geng, Masashi Sugiyama |
Abstract | Partial-label learning is one of the important weakly supervised learning problems, where each training example is equipped with a set of candidate labels that contains the true label. Most existing methods elaborately designed learning objectives as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data. The goal of this paper is to propose a novel framework of partial-label learning without implicit assumptions on the model or optimization algorithm. More specifically, we propose a general estimator of the classification risk, theoretically analyze the classifier-consistency, and establish an estimation error bound. We then explore a progressive identification method for approximately minimizing the proposed risk estimator, where the update of the model and identification of true labels are conducted in a seamless manner. The resulting algorithm is model-independent and loss-independent, and compatible with stochastic optimization. Thorough experiments demonstrate it sets the new state of the art. |
Tasks | Stochastic Optimization |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08053v1 |
https://arxiv.org/pdf/2002.08053v1.pdf | |
PWC | https://paperswithcode.com/paper/progressive-identification-of-true-labels-for |
Repo | |
Framework | |
LoCEC: Local Community-based Edge Classification in Large Online Social Networks
Title | LoCEC: Local Community-based Edge Classification in Large Online Social Networks |
Authors | Chonggang Song, Qian Lin, Guohui Ling, Zongyi Zhang, Hongzhao Chen, Jun Liao, Chuan Chen |
Abstract | Relationships in online social networks often imply social connections in the real world. An accurate understanding of relationship types benefits many applications, e.g. social advertising and recommendation. Some recent attempts have been proposed to classify user relationships into predefined types with the help of pre-labeled relationships or abundant interaction features on relationships. Unfortunately, both relationship feature data and label data are very sparse in real social platforms like WeChat, rendering existing methods inapplicable. In this paper, we present an in-depth analysis of WeChat relationships to identify the major challenges for the relationship classification task. To tackle the challenges, we propose a Local Community-based Edge Classification (LoCEC) framework that classifies user relationships in a social network into real-world social connection types. LoCEC enforces a three-phase processing, namely local community detection, community classification and relationship classification, to address the sparsity issue of relationship features and relationship labels. Moreover, LoCEC is designed to handle large-scale networks by allowing parallel and distributed processing. We conduct extensive experiments on the real-world WeChat network with hundreds of billions of edges to validate the effectiveness and efficiency of LoCEC. |
Tasks | Community Detection, Local Community Detection |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04180v2 |
https://arxiv.org/pdf/2002.04180v2.pdf | |
PWC | https://paperswithcode.com/paper/locec-local-community-based-edge |
Repo | |
Framework | |
Preferential Batch Bayesian Optimization
Title | Preferential Batch Bayesian Optimization |
Authors | Eero Siivola, Akash Kumar Dhaka, Michael Riis Andersen, Javier Gonzalez, Pablo Garcia Moreno, Aki Vehtari |
Abstract | Most research in Bayesian optimization (BO) has focused on direct feedback scenarios, where one has access to exact, or perturbed, values of some expensive-to-evaluate objective. This direction has been mainly driven by the use of BO in machine learning hyper-parameter configuration problems. However, in domains such as modelling human preferences, A/B tests or recommender systems, there is a need of methods that are able to replace direct feedback with preferential feedback, obtained via rankings or pairwise comparisons. In this work, we present Preferential Batch Bayesian Optimization (PBBO), a new framework that allows to find the optimum of a latent function of interest, given any type of parallel preferential feedback for a group of two or more points. We do so by using a Gaussian process model with a likelihood specially designed to enable parallel and efficient data collection mechanisms, which are key in modern machine learning. We show how the acquisitions developed under this framework generalize and augment previous approaches in Bayesian optimization, expanding the use of these techniques to a wider range of domains. An extensive simulation study shows the benefits of this approach, both with simulated functions and four real data sets. |
Tasks | Recommendation Systems |
Published | 2020-03-25 |
URL | https://arxiv.org/abs/2003.11435v1 |
https://arxiv.org/pdf/2003.11435v1.pdf | |
PWC | https://paperswithcode.com/paper/preferential-batch-bayesian-optimization |
Repo | |
Framework | |
An Ontology-based Context Model in Intelligent Environments
Title | An Ontology-based Context Model in Intelligent Environments |
Authors | Tao Gu, Xiao Hang Wang, Hung Keng Pung, Da Qing Zhang |
Abstract | Computing becomes increasingly mobile and pervasive today; these changes imply that applications and services must be aware of and adapt to their changing contexts in highly dynamic environments. Today, building context-aware systems is a complex task due to lack of an appropriate infrastructure support in intelligent environments. A context-aware infrastructure requires an appropriate context model to represent, manipulate and access context information. In this paper, we propose a formal context model based on ontology using OWL to address issues including semantic context representation, context reasoning and knowledge sharing, context classification, context dependency and quality of context. The main benefit of this model is the ability to reason about various contexts. Based on our context model, we also present a Service-Oriented Context-Aware Middleware (SOCAM) architecture for building of context-aware services. |
Tasks | |
Published | 2020-03-06 |
URL | https://arxiv.org/abs/2003.05055v1 |
https://arxiv.org/pdf/2003.05055v1.pdf | |
PWC | https://paperswithcode.com/paper/an-ontology-based-context-model-in |
Repo | |
Framework | |
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
Title | Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization |
Authors | Samuel Horváth, Lihua Lei, Peter Richtárik, Michael I. Jordan |
Abstract | Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step-size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the “geometrization” technique introduced by Lei & Jordan 2016 and the \texttt{SARAH} algorithm of Nguyen et al., 2017, we propose the Geometrized \texttt{SARAH} algorithm for non-convex finite-sum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the Polyak-\L{}ojasiewicz (PL) constant if present. In addition, it achieves the best-available convergence rate for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives. |
Tasks | Stochastic Optimization |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05359v1 |
https://arxiv.org/pdf/2002.05359v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptivity-of-stochastic-gradient-methods-for |
Repo | |
Framework | |
A Diffusion Theory for Deep Learning Dynamics: Stochastic Gradient Descent Escapes From Sharp Minima Exponentially Fast
Title | A Diffusion Theory for Deep Learning Dynamics: Stochastic Gradient Descent Escapes From Sharp Minima Exponentially Fast |
Authors | Zeke Xie, Issei Sato, Masashi Sugiyama |
Abstract | Stochastic optimization algorithms, such as Stochastic Gradient Descent (SGD) and its variants, are mainstream methods for training deep networks in practice. However, the theoretical mechanism behind gradient noise still remains to be further investigated. Deep learning is known to find flat minima with a large neighboring region in parameter space from which each weight vector has similar small error. In this paper, we focus on a fundamental problem in deep learning, “How can deep learning usually find flat minima among so many minima?” To answer the question, we develop a density diffusion theory (DDT) for revealing the fundamental dynamical mechanism of SGD and deep learning. More specifically, we study how escape time from loss valleys to the outside of valleys depends on minima sharpness, gradient noise and hyperparameters. One of the most interesting findings is that stochastic gradient noise from SGD can help escape from sharp minima exponentially faster than flat minima, while white noise can only help escape from sharp minima polynomially faster than flat minima. We also find large-batch training requires exponentially many iterations to pass through sharp minima and find flat minima. We present direct empirical evidence supporting the proposed theoretical results. |
Tasks | Stochastic Optimization |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03495v5 |
https://arxiv.org/pdf/2002.03495v5.pdf | |
PWC | https://paperswithcode.com/paper/a-diffusion-theory-for-deep-learning-dynamics |
Repo | |
Framework | |
A Kernel Mean Embedding Approach to Reducing Conservativeness in Stochastic Programming and Control
Title | A Kernel Mean Embedding Approach to Reducing Conservativeness in Stochastic Programming and Control |
Authors | Jia-Jie Zhu, Bernhard Schölkopf, Moritz Diehl |
Abstract | We apply kernel mean embedding methods to sample-based stochastic optimization and control. Specifically, we use the reduced-set expansion method as a way to discard sampled scenarios. The effect of such constraint removal is improved optimality and decreased conservativeness. This is achieved by solving a distributional-distance-regularized optimization problem. We demonstrated this optimization formulation is well-motivated in theory, computationally tractable, and effective in numerical algorithms. |
Tasks | Stochastic Optimization |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.10398v1 |
https://arxiv.org/pdf/2001.10398v1.pdf | |
PWC | https://paperswithcode.com/paper/a-kernel-mean-embedding-approach-to-reducing |
Repo | |
Framework | |
Symplectic networks: Intrinsic structure-preserving networks for identifying Hamiltonian systems
Title | Symplectic networks: Intrinsic structure-preserving networks for identifying Hamiltonian systems |
Authors | Pengzhan Jin, Aiqing Zhu, George Em Karniadakis, Yifa Tang |
Abstract | This work presents a framework of constructing the neural networks preserving the symplectic structure, so-called symplectic networks (SympNets). With the symplectic networks, we show some numerical results about (\romannumeral1) solving the Hamiltonian systems by learning abundant data points over the phase space, and (\romannumeral2) predicting the phase flows by learning a series of points depending on time. All the experiments point out that the symplectic networks perform much more better than the fully-connected networks that without any prior information, especially in the task of predicting which is unable to do within the conventional numerical methods. |
Tasks | |
Published | 2020-01-11 |
URL | https://arxiv.org/abs/2001.03750v1 |
https://arxiv.org/pdf/2001.03750v1.pdf | |
PWC | https://paperswithcode.com/paper/symplectic-networks-intrinsic-structure |
Repo | |
Framework | |
A^2-GCN: An Attribute-aware Attentive GCN Model for Recommendation
Title | A^2-GCN: An Attribute-aware Attentive GCN Model for Recommendation |
Authors | Fan Liu, Zhiyong Cheng, Lei Zhu, Chenghao Liu, Liqiang Nie |
Abstract | As important side information, attributes have been widely exploited in the existing recommender system for better performance. In the real-world scenarios, it is common that some attributes of items/users are missing (e.g., some movies miss the genre data). Prior studies usually use a default value (i.e., “other”) to represent the missing attribute, resulting in sub-optimal performance. To address this problem, in this paper, we present an attribute-aware attentive graph convolution network (A${^2}$-GCN). In particular, we first construct a graph, whereby users, items, and attributes are three types of nodes and their associations are edges. Thereafter, we leverage the graph convolution network to characterize the complicated interactions among <users, items, attributes>. To learn the node representation, we turn to the message-passing strategy to aggregate the message passed from the other directly linked types of nodes (e.g., a user or an attribute). To this end, we are capable of incorporating associate attributes to strengthen the user and item representations, and thus naturally solve the attribute missing problem. Considering the fact that for different users, the attributes of an item have different influence on their preference for this item, we design a novel attention mechanism to filter the message passed from an item to a target user by considering the attribute information. Extensive experiments have been conducted on several publicly accessible datasets to justify our model. Results show that our model outperforms several state-of-the-art methods and demonstrate the effectiveness of our attention method. |
Tasks | Recommendation Systems |
Published | 2020-03-20 |
URL | https://arxiv.org/abs/2003.09086v1 |
https://arxiv.org/pdf/2003.09086v1.pdf | |
PWC | https://paperswithcode.com/paper/a2-gcn-an-attribute-aware-attentive-gcn-model |
Repo | |
Framework | |
TensorShield: Tensor-based Defense Against Adversarial Attacks on Images
Title | TensorShield: Tensor-based Defense Against Adversarial Attacks on Images |
Authors | Negin Entezari, Evangelos E. Papalexakis |
Abstract | Recent studies have demonstrated that machine learning approaches like deep neural networks (DNNs) are easily fooled by adversarial attacks. Subtle and imperceptible perturbations of the data are able to change the result of deep neural networks. Leveraging vulnerable machine learning methods raises many concerns especially in domains where security is an important factor. Therefore, it is crucial to design defense mechanisms against adversarial attacks. For the task of image classification, unnoticeable perturbations mostly occur in the high-frequency spectrum of the image. In this paper, we utilize tensor decomposition techniques as a preprocessing step to find a low-rank approximation of images which can significantly discard high-frequency perturbations. Recently a defense framework called Shield could “vaccinate” Convolutional Neural Networks (CNN) against adversarial examples by performing random-quality JPEG compressions on local patches of images on the ImageNet dataset. Our tensor-based defense mechanism outperforms the SLQ method from Shield by 14% against FastGradient Descent (FGSM) adversarial attacks, while maintaining comparable speed. |
Tasks | Image Classification |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.10252v1 |
https://arxiv.org/pdf/2002.10252v1.pdf | |
PWC | https://paperswithcode.com/paper/tensorshield-tensor-based-defense-against |
Repo | |
Framework | |
What the [MASK]? Making Sense of Language-Specific BERT Models
Title | What the [MASK]? Making Sense of Language-Specific BERT Models |
Authors | Debora Nozza, Federico Bianchi, Dirk Hovy |
Abstract | Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained contextual representation models. In particular, Devlin et al. (2019) proposed a model, called BERT (Bidirectional Encoder Representations from Transformers), which enables researchers to obtain state-of-the art performance on numerous NLP tasks by fine-tuning the representations on their data set and task, without the need for developing and training highly-specific architectures. The authors also released multilingual BERT (mBERT), a model trained on a corpus of 104 languages, which can serve as a universal language model. This model obtained impressive results on a zero-shot cross-lingual natural inference task. Driven by the potential of BERT models, the NLP community has started to investigate and generate an abundant number of BERT models that are trained on a particular language, and tested on a specific data domain and task. This allows us to evaluate the true potential of mBERT as a universal language model, by comparing it to the performance of these more specific models. This paper presents the current state of the art in language-specific BERT models, providing an overall picture with respect to different dimensions (i.e. architectures, data domains, and tasks). Our aim is to provide an immediate and straightforward overview of the commonalities and differences between Language-Specific (language-specific) BERT models and mBERT. We also provide an interactive and constantly updated website that can be used to explore the information we have collected, at https://bertlang.unibocconi.it. |
Tasks | Language Modelling |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02912v1 |
https://arxiv.org/pdf/2003.02912v1.pdf | |
PWC | https://paperswithcode.com/paper/what-the-mask-making-sense-of-language |
Repo | |
Framework | |
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents
Title | I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents |
Authors | Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam |
Abstract | Dialogue research tends to distinguish between chit-chat and goal-oriented tasks. While the former is arguably more naturalistic and has a wider use of language, the latter has clearer metrics and a straightforward learning signal. Humans effortlessly combine the two, for example engaging in chit-chat with the goal of exchanging information or eliciting a specific response. Here, we bridge the divide between these two domains in the setting of a rich multi-player text-based fantasy environment where agents and humans engage in both actions and dialogue. Specifically, we train a goal-oriented model with reinforcement learning against an imitation-learned ``chit-chat’’ model with two approaches: the policy either learns to pick a topic or learns to pick an utterance given the top-K utterances from the chit-chat model. We show that both models outperform an inverse model baseline and can converse naturally with their dialogue partner in order to achieve goals. | |
Tasks | |
Published | 2020-02-07 |
URL | https://arxiv.org/abs/2002.02878v2 |
https://arxiv.org/pdf/2002.02878v2.pdf | |
PWC | https://paperswithcode.com/paper/i-love-your-chain-mail-making-knights-smile-1 |
Repo | |
Framework | |
Modality-Balanced Models for Visual Dialogue
Title | Modality-Balanced Models for Visual Dialogue |
Authors | Hyounghun Kim, Hao Tan, Mohit Bansal |
Abstract | The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue. However, via manual analysis, we find that a large number of conversational questions can be answered by only looking at the image without any access to the context history, while others still need the conversation context to predict the correct answers. We demonstrate that due to this reason, previous joint-modality (history and image) models over-rely on and are more prone to memorizing the dialogue history (e.g., by extracting certain keywords or patterns in the context information), whereas image-only models are more generalizable (because they cannot memorize or extract keywords from history) and perform substantially better at the primary normalized discounted cumulative gain (NDCG) task metric which allows multiple correct answers. Hence, this observation encourages us to explicitly maintain two models, i.e., an image-only model and an image-history joint model, and combine their complementary abilities for a more balanced multimodal model. We present multiple methods for this integration of the two models, via ensemble and consensus dropout fusion with shared parameters. Empirically, our models achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and high balance across metrics), and substantially outperform the winner of the Visual Dialog challenge 2018 on most metrics. |
Tasks | Visual Dialog |
Published | 2020-01-17 |
URL | https://arxiv.org/abs/2001.06354v1 |
https://arxiv.org/pdf/2001.06354v1.pdf | |
PWC | https://paperswithcode.com/paper/modality-balanced-models-for-visual-dialogue |
Repo | |
Framework | |
Ensemble based discriminative models for Visual Dialog Challenge 2018
Title | Ensemble based discriminative models for Visual Dialog Challenge 2018 |
Authors | Shubham Agarwal, Raghav Goyal |
Abstract | This manuscript describes our approach for the Visual Dialog Challenge 2018. We use an ensemble of three discriminative models with different encoders and decoders for our final submission. Our best performing model on ‘test-std’ split achieves the NDCG score of 55.46 and the MRR value of 63.77, securing third position in the challenge. |
Tasks | Visual Dialog |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.05865v1 |
https://arxiv.org/pdf/2001.05865v1.pdf | |
PWC | https://paperswithcode.com/paper/ensemble-based-discriminative-models-for |
Repo | |
Framework | |
Scaling up Hybrid Probabilistic Inference with Logical and Arithmetic Constraints via Message Passing
Title | Scaling up Hybrid Probabilistic Inference with Logical and Arithmetic Constraints via Message Passing |
Authors | Zhe Zeng, Paolo Morettin, Fanqi Yan, Antonio Vergari, Guy Van den Broeck |
Abstract | Weighted model integration (WMI) is a very appealing framework for probabilistic inference: it allows to express the complex dependencies of real-world problems where variables are both continuous and discrete, via the language of Satisfiability Modulo Theories (SMT), as well as to compute probabilistic queries with complex logical and arithmetic constraints. Yet, existing WMI solvers are not ready to scale to these problems. They either ignore the intrinsic dependency structure of the problem at all, or they are limited to too restrictive structures. To narrow this gap, we derive a factorized formalism of WMI enabling us to devise a scalable WMI solver based on message passing, MP-WMI. Namely, MP-WMI is the first WMI solver which allows to: 1) perform exact inference on the full class of tree-structured WMI problems; 2) compute all marginal densities in linear time; 3) amortize inference inter query. Experimental results show that our solver dramatically outperforms the existing WMI solvers on a large set of benchmarks. |
Tasks | |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2003.00126v1 |
https://arxiv.org/pdf/2003.00126v1.pdf | |
PWC | https://paperswithcode.com/paper/scaling-up-hybrid-probabilistic-inference |
Repo | |
Framework | |