April 2, 2020

3346 words 16 mins read

Paper Group ANR 126

Paper Group ANR 126

Spike-Timing-Dependent Inference of Synaptic Weights. Circle Loss: A Unified Perspective of Pair Similarity Optimization. Insertion-Deletion Transformer. On Solving Cooperative MARL Problems with a Few Good Experiences. Best-item Learning in Random Utility Models with Subset Choices. Fake Review Detection Using Behavioral and Contextual Features. T …

Spike-Timing-Dependent Inference of Synaptic Weights

Title Spike-Timing-Dependent Inference of Synaptic Weights
Authors Nasir Ahmad, Luca Ambrogioni, Marcel A. J. van Gerven
Abstract A potential solution to the weight transport problem, which questions the biological plausibility of the backpropagation of error algorithm, is proposed. We derive our method based upon an (approximate) analysis of the dynamics of leaky integrate-and-fire neurons. We thereafter validate our method and show that the use of spike timing alone out-competes existing biologically plausible methods for synaptic weight inference in spiking neural network models. Furthermore, our proposed method is also more flexible, being applicable to any spiking neuron model, is conservative in how many parameters are required for implementation and can be deployed in an online-fashion with minimal computational overhead. These features, together with its biological plausibility, make it an attractive candidate technique for weight inference at single synapses.
Tasks
Published 2020-03-09
URL https://arxiv.org/abs/2003.03988v1
PDF https://arxiv.org/pdf/2003.03988v1.pdf
PWC https://paperswithcode.com/paper/spike-timing-dependent-inference-of-synaptic
Repo
Framework

Circle Loss: A Unified Perspective of Pair Similarity Optimization

Title Circle Loss: A Unified Perspective of Pair Similarity Optimization
Authors Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, Yichen Wei
Abstract This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity $s_p$ and minimize the between-class similarity $s_n$. We find a majority of loss functions, including the triplet loss and the softmax plus cross-entropy loss, embed $s_n$ and $s_p$ into similarity pairs and seek to reduce $(s_n-s_p)$. Such an optimization manner is inflexible, because the penalty strength on every single similarity score is restricted to be equal. Our intuition is that if a similarity score deviates far from the optimum, it should be emphasized. To this end, we simply re-weight each similarity to highlight the less-optimized similarity scores. It results in a Circle loss, which is named due to its circular decision boundary. The Circle loss has a unified formula for two elemental deep feature learning approaches, i.e. learning with class-level labels and pair-wise labels. Analytically, we show that the Circle loss offers a more flexible optimization approach towards a more definite convergence target, compared with the loss functions optimizing $(s_n-s_p)$. Experimentally, we demonstrate the superiority of the Circle loss on a variety of deep feature learning tasks. On face recognition, person re-identification, as well as several fine-grained image retrieval datasets, the achieved performance is on par with the state of the art.
Tasks Face Recognition, Image Retrieval, Person Re-Identification
Published 2020-02-25
URL https://arxiv.org/abs/2002.10857v1
PDF https://arxiv.org/pdf/2002.10857v1.pdf
PWC https://paperswithcode.com/paper/circle-loss-a-unified-perspective-of-pair
Repo
Framework

Insertion-Deletion Transformer

Title Insertion-Deletion Transformer
Authors Laura Ruis, Mitchell Stern, Julia Proskurnia, William Chan
Abstract We propose the Insertion-Deletion Transformer, a novel transformer-based neural architecture and training method for sequence generation. The model consists of two phases that are executed iteratively, 1) an insertion phase and 2) a deletion phase. The insertion phase parameterizes a distribution of insertions on the current output hypothesis, while the deletion phase parameterizes a distribution of deletions over the current output hypothesis. The training method is a principled and simple algorithm, where the deletion model obtains its signal directly on-policy from the insertion model output. We demonstrate the effectiveness of our Insertion-Deletion Transformer on synthetic translation tasks, obtaining significant BLEU score improvement over an insertion-only model.
Tasks
Published 2020-01-15
URL https://arxiv.org/abs/2001.05540v1
PDF https://arxiv.org/pdf/2001.05540v1.pdf
PWC https://paperswithcode.com/paper/insertion-deletion-transformer
Repo
Framework

On Solving Cooperative MARL Problems with a Few Good Experiences

Title On Solving Cooperative MARL Problems with a Few Good Experiences
Authors Rajiv Ranjan Kumar, Pradeep Varakantham
Abstract Cooperative Multi-agent Reinforcement Learning (MARL) is crucial for cooperative decentralized decision learning in many domains such as search and rescue, drone surveillance, package delivery and fire fighting problems. In these domains, a key challenge is learning with a few good experiences, i.e., positive reinforcements are obtained only in a few situations (e.g., on extinguishing a fire or tracking a crime or delivering a package) and in most other situations there is zero or negative reinforcement. Learning decisions with a few good experiences is extremely challenging in cooperative MARL problems due to three reasons. First, compared to the single agent case, exploration is harder as multiple agents have to be coordinated to receive a good experience. Second, environment is not stationary as all the agents are learning at the same time (and hence change policies). Third, scale of problem increases significantly with every additional agent. Relevant existing work is extensive and has focussed on dealing with a few good experiences in single-agent RL problems or on scalable approaches for handling non-stationarity in MARL problems. Unfortunately, neither of these approaches (or their extensions) are able to address the problem of sparse good experiences effectively. Therefore, we provide a novel fictitious self imitation approach that is able to simultaneously handle non-stationarity and sparse good experiences in a scalable manner. Finally, we provide a thorough comparison (experimental or descriptive) against relevant cooperative MARL algorithms to demonstrate the utility of our approach.
Tasks Multi-agent Reinforcement Learning
Published 2020-01-22
URL https://arxiv.org/abs/2001.07993v1
PDF https://arxiv.org/pdf/2001.07993v1.pdf
PWC https://paperswithcode.com/paper/on-solving-cooperative-marl-problems-with-a
Repo
Framework

Best-item Learning in Random Utility Models with Subset Choices

Title Best-item Learning in Random Utility Models with Subset Choices
Authors Aadirupa Saha, Aditya Gopalan
Abstract We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities. We identify a new property of such a RUM, termed the minimum advantage, that helps in characterizing the complexity of separating pairs of items based on their relative win/loss empirical counts, and can be bounded as a function of the noise distribution alone. We give a learning algorithm for general RUMs, based on pairwise relative counts of items and hierarchical elimination, along with a new PAC sample complexity guarantee of $O(\frac{n}{c^2\epsilon^2} \log \frac{k}{\delta})$ rounds to identify an $\epsilon$-optimal item with confidence $1-\delta$, when the worst case pairwise advantage in the RUM has sensitivity at least $c$ to the parameter gaps of items. Fundamental lower bounds on PAC sample complexity show that this is near-optimal in terms of its dependence on $n,k$ and $c$.
Tasks
Published 2020-02-19
URL https://arxiv.org/abs/2002.07994v1
PDF https://arxiv.org/pdf/2002.07994v1.pdf
PWC https://paperswithcode.com/paper/best-item-learning-in-random-utility-models
Repo
Framework

Fake Review Detection Using Behavioral and Contextual Features

Title Fake Review Detection Using Behavioral and Contextual Features
Authors Jay Kumar
Abstract User reviews reflect significant value of product in the world of e-market. Many firms or product providers hire spammers for misleading new customers by posting spam reviews. There are three types of fake reviews, untruthful reviews, brand reviews and non-reviews. All three types mislead the new customers. A multinomial organization “Yelp” is separating fake reviews from non-fake reviews since last decade. However, there are many e-commerce sites which do not filter fake and non-fake reviews separately. Automatic fake review detection is focused by researcher for last ten years. Many approaches and feature set are proposed for improving classification model of fake review detection. There are two types of dataset commonly used in this research area: psuedo fake and real life reviews. Literature reports low performance of classification model real life dataset if compared with pseudo fake reviews. After investigation behavioral and contextual features are proved important for fake review detection Our research has exploited important behavioral feature of reviewer named as “reviewer deviation”. Our study comprises of investigating reviewer deviation with other contextual and behavioral features. We empirically proved importance of selected feature set for classification model to identify fake reviews. We ranked features in selected feature set where reviewer deviation achieved ninth rank. To assess the viability of selected feature set we scaled dataset and concluded that scaling dataset can improve recall as well as accuracy. Our selected feature set contains a contextual feature which capture text similarity between reviews of a reviewer. We experimented on NNC, LTC and BM25 term weighting schemes for calculating text similarity of reviews. We report that BM25 outperformed other term weighting scheme.
Tasks
Published 2020-02-26
URL https://arxiv.org/abs/2003.00807v1
PDF https://arxiv.org/pdf/2003.00807v1.pdf
PWC https://paperswithcode.com/paper/fake-review-detection-using-behavioral-and
Repo
Framework

The Power of Graph Convolutional Networks to Distinguish Random Graph Models: Short Version

Title The Power of Graph Convolutional Networks to Distinguish Random Graph Models: Short Version
Authors Abram Magner, Mayank Baranwal, Alfred O. Hero III
Abstract Graph convolutional networks (GCNs) are a widely used method for graph representation learning. We investigate the power of GCNs, as a function of their number of layers, to distinguish between different random graph models on the basis of the embeddings of their sample graphs. In particular, the graph models that we consider arise from graphons, which are the most general possible parameterizations of infinite exchangeable graph models and which are the central objects of study in the theory of dense graph limits. We exhibit an infinite class of graphons that are well-separated in terms of cut distance and are indistinguishable by a GCN with nonlinear activation functions coming from a certain broad class if its depth is at least logarithmic in the size of the sample graph. These results theoretically match empirical observations of several prior works. Finally, we show a converse result that for pairs of graphons satisfying a degree profile separation property, a very simple GCN architecture suffices for distinguishability. To prove our results, we exploit a connection to random walks on graphs.
Tasks Graph Representation Learning, Representation Learning
Published 2020-02-13
URL https://arxiv.org/abs/2002.05678v1
PDF https://arxiv.org/pdf/2002.05678v1.pdf
PWC https://paperswithcode.com/paper/the-power-of-graph-convolutional-networks-to-1
Repo
Framework

Learning Unitaries by Gradient Descent

Title Learning Unitaries by Gradient Descent
Authors Bobak Toussi Kiani, Seth Lloyd, Reevu Maity
Abstract We study the hardness of learning unitary transformations in $U(d)$ via gradient descent on time parameters of alternating operator sequences. We provide numerical evidence that, despite the non-convex nature of the loss landscape, gradient descent always converges to the target unitary when the sequence contains $d^2$ or more parameters. Rates of convergence indicate a “computational phase transition.” With less than $d^2$ parameters, gradient descent converges to a sub-optimal solution, whereas with more than $d^2$ parameters, gradient descent converges exponentially to an optimal solution.
Tasks
Published 2020-01-31
URL https://arxiv.org/abs/2001.11897v3
PDF https://arxiv.org/pdf/2001.11897v3.pdf
PWC https://paperswithcode.com/paper/learning-unitaries-by-gradient-descent
Repo
Framework

Towards Graph Representation Learning in Emergent Communication

Title Towards Graph Representation Learning in Emergent Communication
Authors Agnieszka Słowik, Abhinav Gupta, William L. Hamilton, Mateja Jamnik, Sean B. Holden
Abstract Recent findings in neuroscience suggest that the human brain represents information in a geometric structure (for instance, through conceptual spaces). In order to communicate, we flatten the complex representation of entities and their attributes into a single word or a sentence. In this paper we use graph convolutional networks to support the evolution of language and cooperation in multi-agent systems. Motivated by an image-based referential game, we propose a graph referential game with varying degrees of complexity, and we provide strong baseline models that exhibit desirable properties in terms of language emergence and cooperation. We show that the emerged communication protocol is robust, that the agents uncover the true factors of variation in the game, and that they learn to generalize beyond the samples encountered during training.
Tasks Graph Representation Learning, Representation Learning
Published 2020-01-24
URL https://arxiv.org/abs/2001.09063v2
PDF https://arxiv.org/pdf/2001.09063v2.pdf
PWC https://paperswithcode.com/paper/towards-graph-representation-learning-in
Repo
Framework

Needmining: Identifying micro blog data containing customer needs

Title Needmining: Identifying micro blog data containing customer needs
Authors Niklas Kühl, Jan Scheurenbrand, Gerhard Satzger
Abstract The design of new products and services starts with the identification of needs of potential customers or users. Many existing methods like observations, surveys, and experiments draw upon specific efforts to elicit unsatisfied needs from individuals. At the same time, a huge amount of user-generated content in micro blogs is freely accessible at no cost. While this information is already analyzed to monitor sentiments towards existing offerings, it has not yet been tapped for the elicitation of needs. In this paper, we lay an important foundation for this endeavor: we propose a Machine Learning approach to identify those posts that do express needs. Our evaluation of tweets in the e-mobility domain demonstrates that the small share of relevant tweets can be identified with remarkable precision or recall results. Applied to huge data sets, the developed method should enable scalable need elicitation support for innovation managers - across thousands of users, and thus augment the service design tool set available to him.
Tasks
Published 2020-03-12
URL https://arxiv.org/abs/2003.05917v1
PDF https://arxiv.org/pdf/2003.05917v1.pdf
PWC https://paperswithcode.com/paper/needmining-identifying-micro-blog-data
Repo
Framework

Robust Generalization via $α$-Mutual Information

Title Robust Generalization via $α$-Mutual Information
Authors Amedeo Roberto Esposito, Michael Gastpar, Ibrahim Issa
Abstract The aim of this work is to provide bounds connecting two probability measures of the same event using R'enyi $\alpha$-Divergences and Sibson’s $\alpha$-Mutual Information, a generalization of respectively the Kullback-Leibler Divergence and Shannon’s Mutual Information. A particular case of interest can be found when the two probability measures considered are a joint distribution and the corresponding product of marginals (representing the statistically independent scenario). In this case, a bound using Sibson’s $\alpha-$Mutual Information is retrieved, extending a result involving Maximal Leakage to general alphabets. These results have broad applications, from bounding the generalization error of learning algorithms to the more general framework of adaptive data analysis, provided that the divergences and/or information measures used are amenable to such an analysis ({\it i.e.,} are robust to post-processing and compose adaptively). The generalization error bounds are derived with respect to high-probability events but a corresponding bound on expected generalization error is also retrieved.
Tasks
Published 2020-01-14
URL https://arxiv.org/abs/2001.06399v1
PDF https://arxiv.org/pdf/2001.06399v1.pdf
PWC https://paperswithcode.com/paper/robust-generalization-via-mutual-information
Repo
Framework

Learning Product Graphs Underlying Smooth Graph Signals

Title Learning Product Graphs Underlying Smooth Graph Signals
Authors Muhammad Asad Lodhi, Waheed U. Bajwa
Abstract Real-world data is often times associated with irregular structures that can analytically be represented as graphs. Having access to this graph, which is sometimes trivially evident from domain knowledge, provides a better representation of the data and facilitates various information processing tasks. However, in cases where the underlying graph is unavailable, it needs to be learned from the data itself for data representation, data processing and inference purposes. Existing literature on learning graphs from data has mostly considered arbitrary graphs, whereas the graphs generating real-world data tend to have additional structure that can be incorporated in the graph learning procedure. Structure-aware graph learning methods require learning fewer parameters and have the potential to reduce computational, memory and sample complexities. In light of this, the focus of this paper is to devise a method to learn structured graphs from data that are given in the form of product graphs. Product graphs arise naturally in many real-world datasets and provide an efficient and compact representation of large-scale graphs through several smaller factor graphs. To this end, first the graph learning problem is posed as a linear program, which (on average) outperforms the state-of-the-art graph learning algorithms. This formulation is of independent interest itself as it shows that graph learning is possible through a simple linear program. Afterwards, an alternating minimization-based algorithm aimed at learning various types of product graphs is proposed, and local convergence guarantees to the true solution are established for this algorithm. Finally the performance gains, reduced sample complexity, and inference capabilities of the proposed algorithm over existing methods are also validated through numerical simulations on synthetic and real datasets.
Tasks
Published 2020-02-26
URL https://arxiv.org/abs/2002.11277v1
PDF https://arxiv.org/pdf/2002.11277v1.pdf
PWC https://paperswithcode.com/paper/learning-product-graphs-underlying-smooth
Repo
Framework

Revisiting the Sibling Head in Object Detector

Title Revisiting the Sibling Head in Object Detector
Authors Guanglu Song, Yu Liu, Xiaogang Wang
Abstract The ``shared head for classification and localization’’ (sibling head), firstly denominated in Fast RCNN~\cite{girshick2015fast}, has been leading the fashion of the object detection community in the past five years. This paper provides the observation that the spatial misalignment between the two object functions in the sibling head can considerably hurt the training process, but this misalignment can be resolved by a very simple operator called task-aware spatial disentanglement (TSD). Considering the classification and regression, TSD decouples them from the spatial dimension by generating two disentangled proposals for them, which are estimated by the shared proposal. This is inspired by the natural insight that for one instance, the features in some salient area may have rich information for classification while these around the boundary may be good at bounding box regression. Surprisingly, this simple design can boost all backbones and models on both MS COCO and Google OpenImage consistently by ~3% mAP. Further, we propose a progressive constraint to enlarge the performance margin between the disentangled and the shared proposals, and gain ~1% more mAP. We show the \algname{} breaks through the upper bound of nowadays single-model detector by a large margin (mAP 49.4 with ResNet-101, 51.2 with SENet154), and is the core model of our 1st place solution on the Google OpenImage Challenge 2019. |
Tasks Object Detection
Published 2020-03-17
URL https://arxiv.org/abs/2003.07540v1
PDF https://arxiv.org/pdf/2003.07540v1.pdf
PWC https://paperswithcode.com/paper/revisiting-the-sibling-head-in-object
Repo
Framework

Multi-Lead ECG Classification via an Information-Based Attention Convolutional Neural Network

Title Multi-Lead ECG Classification via an Information-Based Attention Convolutional Neural Network
Authors Hao Tung, Chao Zheng, Xinsheng Mao, Dahong Qian
Abstract Objective: A novel structure based on channel-wise attention mechanism is presented in this paper. Embedding with the proposed structure, an efficient classification model that accepts multi-lead electrocardiogram (ECG) as input is constructed. Methods: One-dimensional convolutional neural networks (CNN) have proven to be effective in pervasive classification tasks, enabling the automatic extraction of features while classifying targets. We implement the Residual connection and design a structure which can learn the weights from the information contained in different channels in the input feature map during the training process. An indicator named mean square deviation is introduced to monitor the performance of a particular model segment in the classification task on the two out of the five ECG classes. The data in the MIT-BIH arrhythmia database is used and a series of control experiments is conducted. Results: Utilizing both leads of the ECG signals as input to the neural network classifier can achieve better classification results than those from using single channel inputs in different application scenarios. Models embedded with the channel-wise attention structure always achieve better scores on sensitivity and precision than the plain Resnet models. The proposed model exceeds the performance of most of the state-of-the-art models in ventricular ectopic beats (VEB) classification, and achieves competitive scores for supraventricular ectopic beats (SVEB). Conclusion: Adopting more lead ECG signals as input can increase the dimensions of the input feature maps, helping to improve both the performance and generalization of the network model. Significance: Due to its end-to-end characteristics, and the extensible intrinsic for multi-lead heart diseases diagnosing, the proposed model can be used for the real-time ECG tracking of ECG waveforms for Holter or wearable devices.
Tasks ECG Classification
Published 2020-03-25
URL https://arxiv.org/abs/2003.12009v1
PDF https://arxiv.org/pdf/2003.12009v1.pdf
PWC https://paperswithcode.com/paper/multi-lead-ecg-classification-via-an
Repo
Framework

“An Image is Worth a Thousand Features”: Scalable Product Representations for In-Session Type-Ahead Personalization

Title “An Image is Worth a Thousand Features”: Scalable Product Representations for In-Session Type-Ahead Personalization
Authors Bingqing Yu, Jacopo Tagliabue, Ciro Greco, Federico Bianchi
Abstract We address the problem of personalizing query completion in a digital commerce setting, in which the bounce rate is typically high and recurring users are rare. We focus on in-session personalization and improve a standard noisy channel model by injecting dense vectors computed from product images at query time. We argue that image-based personalization displays several advantages over alternative proposals (from data availability to business scalability), and provide quantitative evidence and qualitative support on the effectiveness of the proposed methods. Finally, we show how a shared vector space between similar shops can be used to improve the experience of users browsing across sites, opening up the possibility of applying zero-shot unsupervised personalization to increase conversions. This will prove to be particularly relevant to retail groups that manage multiple brands and/or websites and to multi-tenant SaaS providers that serve multiple clients in the same space.
Tasks
Published 2020-03-11
URL https://arxiv.org/abs/2003.07160v1
PDF https://arxiv.org/pdf/2003.07160v1.pdf
PWC https://paperswithcode.com/paper/an-image-is-worth-a-thousand-features
Repo
Framework
comments powered by Disqus