为Mahout推荐器使用多个加权数据模型 [英] Utilizing multiple, weighed data models for a Mahout recommender

查看：116 发布时间：2020/5/5 11:15:26 mahout recommendation-engine

本文介绍了为Mahout推荐器使用多个加权数据模型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个基于用户相似性的布尔型偏好推荐器.我的数据集实质上包含以下关系:ItemId是用户已决定阅读的文章.我想添加第二个数据模型，其中包含ItemId是对特定主题的订阅.

I have a boolean preference recommender based on user similarity. My data set essentially contains relations where ItemId are articles the user has decided to read. I'd like to add a second data model containing where ItemId is a subscription to a particular topic.

我能想象的唯一方法是将两者合并在一起，以抵消订阅ID，以免它们与文章ID发生冲突.对于加权，我考虑了删除布尔值偏好设置并引入偏好分数，例如，文章子集的偏好分数为1，订阅子集的偏好分数为2.

The only way I can imagine doing this is by merging the two together, offsetting the subscription IDs so that they don't collide with the article IDs. For weighting I considered dropping the boolean preference setup and introducing preference scores, where the articles subset has a preference score of 1 (for example) and the subscriptions subset has a preference score of 2.

但是，我不确定这是否行得通，因为偏好得分与我所追求的权重并不完全相似；他们可能包含一些表示不满意的较低分数的概念.

I'm not sure if this would work, however, because the preference score isn't exactly analogous to the sort of weighting I'm after; they probably include some concept of lower scores representing dissatisfaction.

我必须想象有一种更好的方法可以做到这一点，或者至少我的计划有一些调整，可以使其按我的意愿行事.

I have to imagine there's a better way to do this or at least that there are tweaks to my plan which would make it work more along the lines I desire.

推荐答案

我认为您正在以正确的方式考虑它.是的，您想要一种比简单存在/不存在的订阅和文章更具表达力的方法，因为它们的含义有所不同.我建议选择可以反映其相对频率的权重.例如，如果用户在整个时间内阅读了10万篇文章，并进行了10000次订阅，那么您可以将订阅权重设置为"10"，将阅读权重选择为"1".

I think you're thinking of it in the right way. Yes you want a bit more expressiveness than a simple exists/doesn't exist for subscriptions and articles since they mean somewhat different things. I would suggest picking weights that reflect their relative frequency. For example if users have read 100K articles over all time, and made 10000 subscriptions, then you might pick a subscription weight to be "10" and a read weight to be "1".

由于多种原因，如果您将这些值视为偏好得分，则效果不佳.如果您使用一种将其视为线性权重的方法，效果会更好.

This doesn't quite work if you treat those values as preference scores, for a number of reasons. It works better if you use an approach that treats them like what they are, which are linear weights.

我将向您指出ALS-WR算法，该算法是专门为此类输入设计的.例如:用于隐式反馈数据集的协作过滤

I would point you to the ALS-WR algorithm, which is specifically designed for this type of input. For example: Collaborative Filtering for Implicit Feedback Datasets

这在Mahout中作为Hadoop上的ParallelALSFactorizationJob实现.尽管需要Hadoop，但效果很好. (尽管我确实在Mahout中编写了大多数推荐程序代码，但我对此不以为然.)

This is implemented in Mahout as ParallelALSFactorizationJob on Hadoop. It works nicely though requires Hadoop. (I can't take credit for that, though I did write most of the recommender code in Mahout.)

广告:我正在将下一代"系统商业化，该系统是根据我在Mahout的工作发展而来的，即 Myrrix .它是ALS-WR的实现，非常适合您的输入. 下载并运行非常容易，并且不需要Hadoop.

Advertisement: I'm working on commercializing a "next generation" system, evolved from my work in Mahout, as Myrrix. It is an implementation of ALS-WR and is ideal for your kind of input. It's quite easy to download and run, and doesn't need Hadoop.

鉴于它可能直接适合您的问题，我不介意在这里插入它.

Given that it may be directly suitable for your problem I don't mind plugging it here.

这篇关于为Mahout推荐器使用多个加权数据模型的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为Mahout推荐器使用多个加权数据模型 [英] Utilizing multiple, weighed data models for a Mahout recommender

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为Mahout推荐器使用多个加权数据模型 [英] Utilizing multiple, weighed data models for a Mahout recommender

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭