为新数据集扩展Mahout [英] Extend Mahout for new dataset

查看:101
本文介绍了为新数据集扩展Mahout的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想基于Mahout构建推荐模型.我的数据集格式除了用户ID,项目ID,评分和时间戳之外,还有其他列.因此,我认为我需要扩展 FileDataModel.

I want to build a recommendation model based on Mahout. My dataset format has extra columns other than userID, itemID, rating and timestamp. Thus, I think I need to extend the FileDataModel.

我以 JesterDataModel 为例.但是,我对逻辑流程有疑问.在其 buildModel()方法中,首先构造了一个空的映射数据".然后将其扔到processFile中.我假设在此方法中修改了数据",因为稍后将其用于构造GenericDataModel.但是,数据是局部变量而不是类变量,那么如何对其进行修改?

I looked into JesterDataModel as an example. However, I have a problem with the logic flow. In its buildModel() method, an empty map "data" is first constructed. It is then thrown into processFile. I assume that "data" is modified in this method, since later it is used to construct the GenericDataModel However, data is a local variable instead of a class variable, so how is it modified?

processFile(iterator, data, timestamps, false);
return new GenericDataModel(GenericDataModel.toDataMap(data, true));

推荐答案

我明白了.我相信您必须重写一些主要部分,例如DataModel,Similiness演算等等,以使其正常工作.您可以查看Rescorer,它允许您介绍自己的逻辑并过滤掉项目或根据您的要求增加其他项目.

I see... I believe you would have to rewrite major parts like DataModel, Similarities calculation, and so on and so on, to make that work. You can look at the Rescorer which allows you to introduce your own logic and filter items out or boost some other items based on your requirements.

在《行动中的Mahout》一书的第5章中,有一个有关如何使用Rescorer类的示例.您可以在(链接)

In chapter 5 of the Mahout in Action book there is an example of how to use the Rescorer class. You can see the code here (link)

这篇关于为新数据集扩展Mahout的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆