January 27, 2020

3303 words 16 mins read

Paper Group ANR 1139

Paper Group ANR 1139

Social Influence-based Attentive Mavens Mining and Aggregative Representation Learning for Group Recommendation. Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments. A Specialized Evolutionary Strategy Using Mean Absolute Error Random Sampling to Design Recurrent Neural Networks. Performanc …

Social Influence-based Attentive Mavens Mining and Aggregative Representation Learning for Group Recommendation

Title Social Influence-based Attentive Mavens Mining and Aggregative Representation Learning for Group Recommendation
Authors Peipei Wang, Lin Li, Yi Yu, Guandong Xu
Abstract Frequent group activities of human beings have become an indispensable part in their daily life. Group recommendation can recommend satisfactory activities to group members in the recommender systems, and the key issue is how to aggregate preferences in different group members. Most existing group recommendation employed the predefined static aggregation strategies to aggregate the preferences of different group members, but these static strategies cannot simulate the dynamic group decision-making. Meanwhile, most of these methods depend on intuitions or assumptions to analyze the influence of group members and lack of convincing theoretical support. We argue that the influence of group members plays a particularly important role in group decision-making and it can better assist group profile modeling and perform more accurate group recommendation. To tackle the issue of preference aggregation for group recommendation, we propose a novel attentive aggregation representation learning method based on sociological theory for group recommendation, namely SIAGR (short for “Social Influence-based Attentive Group Recommendation”), which takes attention mechanisms and the popular method (BERT) as the aggregation representation for group profile modeling. Specifically, we analyze the influence of group members based on social identity theory and two-step flow theory and exploit an attentive mavens mining method. In addition, we develop a BERT-based representation method to learn the interaction of group members. Lastly, we complete the group recommendation under the neural collaborative filtering framework and verify the effectiveness of the proposed method by experimenting.
Tasks Decision Making, Recommendation Systems, Representation Learning
Published 2019-08-10
URL https://arxiv.org/abs/1909.01079v1
PDF https://arxiv.org/pdf/1909.01079v1.pdf
PWC https://paperswithcode.com/paper/social-influence-based-attentive-mavens
Repo
Framework

Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments

Title Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments
Authors Vinicius G. Goecks, Gregory M. Gremillion, Vernon J. Lawhern, John Valasek, Nicholas R. Waytowich
Abstract This paper investigates how to efficiently transition and update policies, trained initially with demonstrations, using off-policy actor-critic reinforcement learning. It is well-known that techniques based on Learning from Demonstrations, for example behavior cloning, can lead to proficient policies given limited data. However, it is currently unclear how to efficiently update that policy using reinforcement learning as these approaches are inherently optimizing different objective functions. Previous works have used loss functions which combine behavioral cloning losses with reinforcement learning losses to enable this update, however, the components of these loss functions are often set anecdotally, and their individual contributions are not well understood. In this work we propose the Cycle-of-Learning (CoL) framework that uses an actor-critic architecture with a loss function that combines behavior cloning and 1-step Q-learning losses with an off-policy pre-training step from human demonstrations. This enables transition from behavior cloning to reinforcement learning without performance degradation and improves reinforcement learning in terms of overall performance and training time. Additionally, we carefully study the composition of these combined losses and their impact on overall policy learning. We show that our approach outperforms state-of-the-art techniques for combining behavior cloning and reinforcement learning for both dense and sparse reward scenarios. Our results also suggest that directly including the behavior cloning loss on demonstration data helps to ensure stable learning and ground future policy updates.
Tasks Q-Learning
Published 2019-10-09
URL https://arxiv.org/abs/1910.04281v1
PDF https://arxiv.org/pdf/1910.04281v1.pdf
PWC https://paperswithcode.com/paper/integrating-behavior-cloning-and
Repo
Framework

A Specialized Evolutionary Strategy Using Mean Absolute Error Random Sampling to Design Recurrent Neural Networks

Title A Specialized Evolutionary Strategy Using Mean Absolute Error Random Sampling to Design Recurrent Neural Networks
Authors Andrés Camero, Jamal Toutouh, Enrique Alba
Abstract Recurrent neural networks have demonstrated to be good at solving prediction problems. However, finding a network that suits a problem is quite hard because of their high sensitivity to the hyperparameter configuration. Automatic hyperparameter optimization methods help to find the most suitable configuration, but they are not extensively adopted because of their high computational cost. In this work, we study the use of the mean absolute error random sampling to compare multiple-hidden-layer architectures and propose an evolutionary strategy-based algorithm that uses its results to optimize the configuration of a recurrent network. We empirically validate our proposal and show that it is possible to predict and compare the expected performance of a hyperparameter configuration in a low-cost way, as well as use these predictions to optimize the configuration of a recurrent network.
Tasks Hyperparameter Optimization
Published 2019-09-04
URL https://arxiv.org/abs/1909.02425v1
PDF https://arxiv.org/pdf/1909.02425v1.pdf
PWC https://paperswithcode.com/paper/a-specialized-evolutionary-strategy-using
Repo
Framework

Performance Evalution of 3D Keypoint Detectors and Descriptors for Plants Health Classification

Title Performance Evalution of 3D Keypoint Detectors and Descriptors for Plants Health Classification
Authors Shiva Azimi, Brejesh lall, Tapan K. Gandhi
Abstract Plant Phenomics based on imaging based techniques can be used to monitor the health and the diseases of plants and crops. The use of 3D data for plant phenomics is a recent phenomenon. However, since 3D point cloud contains more information than plant images, in this paper, we compare the performance of different keypoint detectors and local feature descriptors combinations for the plant growth stage and it’s growth condition classification based on 3D point clouds of the plants. We have also implemented a modified form of 3D SIFT descriptor, that is invariant to rotation and is computationally less intense than most of the 3D SIFT descriptors reported in the existing literature. The performance is evaluated in terms of the classification accuracy and the results are presented in terms of accuracy tables. We find the ISS-SHOT and the SIFT-SIFT combinations consistently perform better and Fisher Vector (FV) is a better encoder than Vector of Linearly Aggregated (VLAD) for such applications. It can serve as a better modality.
Tasks
Published 2019-04-02
URL http://arxiv.org/abs/1904.08493v1
PDF http://arxiv.org/pdf/1904.08493v1.pdf
PWC https://paperswithcode.com/paper/190408493
Repo
Framework

Who wrote this book? A challenge for e-commerce

Title Who wrote this book? A challenge for e-commerce
Authors Béranger Dumont, Simona Maggio, Ghiles Sidi Said, Quoc-Tien Au
Abstract Modern e-commerce catalogs contain millions of references, associated with textual and visual information that is of paramount importance for the products to be found via search or browsing. Of particular significance is the book category, where the author name(s) field poses a significant challenge. Indeed, books written by a given author (such as F. Scott Fitzgerald) might be listed with different authors’ names in a catalog due to abbreviations and spelling variants and mistakes, among others. To solve this problem at scale, we design a composite system involving open data sources for books as well as machine learning components leveraging deep learning-based techniques for natural language processing. In particular, we use Siamese neural networks for an approximate match with known author names, and direct correction of the provided author’s name using sequence-to-sequence learning with neural networks. We evaluate this approach on product data from the e-commerce website Rakuten France, and find that the top proposal of the system is the normalized author name with 72% accuracy.
Tasks
Published 2019-04-19
URL http://arxiv.org/abs/1905.01973v1
PDF http://arxiv.org/pdf/1905.01973v1.pdf
PWC https://paperswithcode.com/paper/190501973
Repo
Framework

Learning Agent Communication under Limited Bandwidth by Message Pruning

Title Learning Agent Communication under Limited Bandwidth by Message Pruning
Authors Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, Yan Ni
Abstract Communication is a crucial factor for the big multi-agent world to stay organized and productive. Recently, Deep Reinforcement Learning (DRL) has been applied to learn the communication strategy and the control policy for multiple agents. However, the practical \emph{\textbf{limited bandwidth}} in multi-agent communication has been largely ignored by the existing DRL methods. Specifically, many methods keep sending messages incessantly, which consumes too much bandwidth. As a result, they are inapplicable to multi-agent systems with limited bandwidth. To handle this problem, we propose a gating mechanism to adaptively prune less beneficial messages. We evaluate the gating mechanism on several tasks. Experiments demonstrate that it can prune a lot of messages with little impact on performance. In fact, the performance may be greatly improved by pruning redundant messages. Moreover, the proposed gating mechanism is applicable to several previous methods, equipping them the ability to address bandwidth restricted settings.
Tasks
Published 2019-12-03
URL https://arxiv.org/abs/1912.05304v1
PDF https://arxiv.org/pdf/1912.05304v1.pdf
PWC https://paperswithcode.com/paper/learning-agent-communication-under-limited
Repo
Framework

SPICE: Self-supervised Pitch Estimation

Title SPICE: Self-supervised Pitch Estimation
Authors Beat Gfeller, Christian Frank, Dominik Roblek, Matt Sharifi, Marco Tagliasacchi, Mihajlo Velimirović
Abstract We propose a model to estimate the fundamental frequency in monophonic audio, often referred to as pitch estimation. We acknowledge the fact that obtaining ground truth annotations at the required temporal and frequency resolution is a particularly daunting task. Therefore, we propose to adopt a self-supervised learning technique, which is able to estimate (relative) pitch without any form of supervision. The key observation is that pitch shift maps to a simple translation when the audio signal is analysed through the lens of the constant-Q transform (CQT). We design a self-supervised task by feeding two shifted slices of the CQT to the same convolutional encoder, and require that the difference in the outputs is proportional to the corresponding difference in pitch. In addition, we introduce a small model head on top of the encoder, which is able to determine the confidence of the pitch estimate, so as to distinguish between voiced and unvoiced audio. Our results show that the proposed method is able to estimate pitch at a level of accuracy comparable to fully supervised models, both on clean and noisy audio samples, yet it does not require access to large labeled datasets
Tasks
Published 2019-10-25
URL https://arxiv.org/abs/1910.11664v1
PDF https://arxiv.org/pdf/1910.11664v1.pdf
PWC https://paperswithcode.com/paper/spice-self-supervised-pitch-estimation
Repo
Framework

2nd Place Solution in Google AI Open Images Object Detection Track 2019

Title 2nd Place Solution in Google AI Open Images Object Detection Track 2019
Authors Ruoyu Guo, Cheng Cui, Yuning Du, Xianglong Meng, Xiaodi Wang, Jingwei Liu, Jianfeng Zhu, Yuan Feng, Shumin Han
Abstract We present an object detection framework based on PaddlePaddle. We put all the strategies together (multi-scale training, FPN, Cascade, Dcnv2, Non-local, libra loss) based on ResNet200-vd backbone. Our model score on public leaderboard comes to 0.6269 with single scale test. We proposed a new voting method called top-k voting-nms, based on the SoftNMS detection results. The voting method helps us merge all the models’ results more easily and achieve 2nd place in the Google AI Open Images Object Detection Track 2019.
Tasks Object Detection
Published 2019-11-17
URL https://arxiv.org/abs/1911.07171v1
PDF https://arxiv.org/pdf/1911.07171v1.pdf
PWC https://paperswithcode.com/paper/2nd-place-solution-in-google-ai-open-images
Repo
Framework

The Heidelberg spiking datasets for the systematic evaluation of spiking neural networks

Title The Heidelberg spiking datasets for the systematic evaluation of spiking neural networks
Authors Benjamin Cramer, Yannik Stradmann, Johannes Schemmel, Friedemann Zenke
Abstract Spiking neural networks are the basis of versatile and power-efficient information processing in the brain. Although we currently lack a detailed understanding of how these networks compute, recently developed optimization techniques allow us to instantiate increasingly complex functional spiking neural networks in-silico. These methods hold the promise to build more efficient non-von-Neumann computing hardware and will offer new vistas in the quest of unraveling brain circuit function. To accelerate the development of such methods, objective ways to compare their performance are indispensable. Presently, however, there are no widely accepted means for comparing the computational performance of spiking neural networks. To address this issue, we introduce a general audio-to-spiking conversion procedure and provide two novel spike-based classification datasets. The datasets are free and require no additional preprocessing, which renders them broadly applicable to benchmark both software and neuromorphic hardware implementations of spiking neural networks. By training a range of conventional and spiking classifiers, we show that leveraging spike timing information within these datasets is essential for good classification accuracy. These results serve as the first reference for future performance comparisons of spiking neural networks.
Tasks
Published 2019-10-16
URL https://arxiv.org/abs/1910.07407v2
PDF https://arxiv.org/pdf/1910.07407v2.pdf
PWC https://paperswithcode.com/paper/the-heidelberg-spiking-datasets-for-the
Repo
Framework

Learning Rich Representations For Structured Visual Prediction Tasks

Title Learning Rich Representations For Structured Visual Prediction Tasks
Authors Mohammadreza Mostajabi
Abstract We describe an approach to learning rich representations for images, that enables simple and effective predictors in a range of vision tasks involving spatially structured maps. Our key idea is to map small image elements to feature representations extracted from a sequence of nested regions of increasing spatial extent. These regions are obtained by “zooming out” from the pixel/superpixel all the way to scene-level resolution, and hence we call these zoom-out features. Applied to semantic segmentation and other structured prediction tasks, our approach exploits statistical structure in the image and in the label space without setting up explicit structured prediction mechanisms, and thus avoids complex and expensive inference. Instead image elements are classified by a feedforward multilayer network with skip-layer connections spanning the zoom-out levels. When used in conjunction with modern neural architectures such as ResNet, DenseNet and NASNet (to which it is complementary) our approach achieves competitive accuracy on segmentation benchmarks. In addition, we propose an approach for learning category-level semantic segmentation purely from image-level classification tag. It exploits localization cues that emerge from training a modified zoom-out architecture tailored for classification tasks, to drive a weakly supervised process that automatically labels a sparse, diverse training set of points likely to belong to classes of interest. Finally, we introduce data-driven regularization functions for the supervised training of CNNs. Our innovation takes the form of a regularizer derived by learning an autoencoder over the set of annotations. This approach leverages an improved representation of label space to inform extraction of features from images
Tasks Semantic Segmentation, Structured Prediction
Published 2019-08-30
URL https://arxiv.org/abs/1908.11820v1
PDF https://arxiv.org/pdf/1908.11820v1.pdf
PWC https://paperswithcode.com/paper/learning-rich-representations-for-structured
Repo
Framework

Investigating Retrieval Method Selection with Axiomatic Features

Title Investigating Retrieval Method Selection with Axiomatic Features
Authors Siddhant Arora, Andrew Yates
Abstract We consider algorithm selection in the context of ad-hoc information retrieval. Given a query and a pair of retrieval methods, we propose a meta-learner that predicts how to combine the methods’ relevance scores into an overall relevance score. Inspired by neural models’ different properties with regard to IR axioms, these predictions are based on features that quantify axiom-related properties of the query and its top ranked documents. We conduct an evaluation on TREC Web Track data and find that the meta-learner often significantly improves over the individual methods. Finally, we conduct feature and query weight analyses to investigate the meta-learner’s behavior.
Tasks Ad-Hoc Information Retrieval, Information Retrieval
Published 2019-04-11
URL http://arxiv.org/abs/1904.05737v1
PDF http://arxiv.org/pdf/1904.05737v1.pdf
PWC https://paperswithcode.com/paper/investigating-retrieval-method-selection-with
Repo
Framework

Nonparametric Online Learning Using Lipschitz Regularized Deep Neural Networks

Title Nonparametric Online Learning Using Lipschitz Regularized Deep Neural Networks
Authors Guy Uziel
Abstract Deep neural networks are considered to be state of the art models in many offline machine learning tasks. However, their performance and generalization abilities in online learning tasks are much less understood. Therefore, we focus on online learning and tackle the challenging problem where the underlying process is stationary and ergodic and thus removing the i.i.d. assumption and allowing observations to depend on each other arbitrarily. We prove the generalization abilities of Lipschitz regularized deep neural networks and show that by using those networks, a convergence to the best possible prediction strategy is guaranteed.
Tasks
Published 2019-05-26
URL https://arxiv.org/abs/1905.10821v1
PDF https://arxiv.org/pdf/1905.10821v1.pdf
PWC https://paperswithcode.com/paper/nonparametric-online-learning-using-lipschitz
Repo
Framework

Concentration of the matrix-valued minimum mean-square error in optimal Bayesian inference

Title Concentration of the matrix-valued minimum mean-square error in optimal Bayesian inference
Authors Jean Barbier
Abstract We consider Bayesian inference of signals with vector-valued entries. Extending concentration techniques from the mathematical physics of spin glasses, we show that the matrix-valued minimum mean-square error concentrates when the size of the problem increases. Such results are often crucial for proving single-letter formulas for the mutual information when they exist. Our proof is valid in the optimal Bayesian inference setting, meaning that it relies on the assumption that the model and all its hyper-parameters are known. Examples of inference and learning problems covered by our results are spiked matrix and tensor models, the committee machine neural network with few hidden neurons in the teacher-student scenario, or multi-layers generalized linear models.
Tasks Bayesian Inference
Published 2019-07-15
URL https://arxiv.org/abs/1907.07103v1
PDF https://arxiv.org/pdf/1907.07103v1.pdf
PWC https://paperswithcode.com/paper/concentration-of-the-matrix-valued-minimum
Repo
Framework

rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

Title rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method
Authors Zheng-Hua Tan, Achintya kr. Sarkar, Najim Dehak
Abstract This paper presents an unsupervised segment-based method for robust voice activity detection (rVAD). The method consists of two passes of denoising followed by a voice activity detection (VAD) stage. In the first pass, high-energy segments in a speech signal are detected by using a posteriori signal-to-noise ratio (SNR) weighted energy difference and if no pitch is detected within a segment, the segment is considered as a high-energy noise segment and set to zero. In the second pass, the speech signal is denoised by a speech enhancement method, for which several methods are explored. Next, neighbouring frames with pitch are grouped together to form pitch segments, and based on speech statistics, the pitch segments are further extended from both ends in order to include both voiced and unvoiced sounds and likely non-speech parts as well. In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity. We evaluate the VAD performance of the proposed method using two databases, RATS and Aurora-2, which contain a large variety of noise conditions. The rVAD method is further evaluated, in terms of speaker verification performance, on the RedDots 2016 challenge database and its noise-corrupted versions. Experiment results show that rVAD is compared favourably with a number of existing methods. In addition, we present a modified version of rVAD where computationally intensive pitch extraction is replaced by computationally efficient spectral flatness calculation. The modified version significantly reduces the computational complexity at the cost of moderately inferior VAD performance, which is an advantage when processing a large amount of data and running on low resource devices. The source code of rVAD is made publicly available.
Tasks Action Detection, Activity Detection, Denoising, Speaker Verification, Speech Enhancement
Published 2019-06-09
URL https://arxiv.org/abs/1906.03588v1
PDF https://arxiv.org/pdf/1906.03588v1.pdf
PWC https://paperswithcode.com/paper/rvad-an-unsupervised-segment-based-robust
Repo
Framework

Cricket stroke extraction: Towards creation of a large-scale cricket actions dataset

Title Cricket stroke extraction: Towards creation of a large-scale cricket actions dataset
Authors Arpan Gupta, Sakthi Balan M
Abstract In this paper, we deal with the problem of temporal action localization for a large-scale untrimmed cricket videos dataset. Our action of interest for cricket videos is a cricket stroke played by a batsman, which is, usually, covered by cameras placed at the stands of the cricket ground at both ends of the cricket pitch. After applying a sequence of preprocessing steps, we have ~73 million frames for 1110 videos in the dataset at constant frame rate and resolution. The method of localization is a generalized one which applies a trained random forest model for CUTs detection(using summed up grayscale histogram difference features) and two linear SVM camera models(CAM1 and CAM2) for first frame detection, trained on HOG features of CAM1 and CAM2 video shots. CAM1 and CAM2 are assumed to be part of the cricket stroke. At the predicted boundary positions, the HOG features of the first frames are computed and a simple algorithm was used to combine the positively predicted camera shots. In order to make the process as generic as possible, we did not consider any domain specific knowledge, such as tracking or specific shape and motion features. The detailed analysis of our methodology is provided along with the metrics used for evaluation of individual models, and the final predicted segments. We achieved a weighted mean TIoU of 0.5097 over a small sample of the test set.
Tasks Action Localization, Game of Cricket, Temporal Action Localization
Published 2019-01-10
URL http://arxiv.org/abs/1901.03107v1
PDF http://arxiv.org/pdf/1901.03107v1.pdf
PWC https://paperswithcode.com/paper/cricket-stroke-extraction-towards-creation-of
Repo
Framework
comments powered by Disqus