用户与当前数据匹配 [英] User matching with current data

查看:102
本文介绍了用户与当前数据匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两个不同类型的用户(导师和Mentees)的数据库,因此我希望第二个群组(Mentees)能够搜索"与他们的个人资料相匹配的第一个群组(Mentors)中的人.导师和Mentee可以随时进入和更改个人资料中的项目.

I have a database full of two different types of users (Mentors and Mentees), whereby I want the second group (Mentees) to be able to "search" for people in the first group (Mentors) who match their profile. Mentors and Mentees can both go in and change items in their profile at any point in time.

当前,我正在使用Apache Mahout进行用户匹配(recommender.mostSimilarIDs()).我遇到的问题是,每次有人搜索时,我都必须重新加载用户数据.就其本身而言,这并不需要花费那么长的时间,但是当Mahout处理数据时,它似乎要花费很长时间(3000 Mentors和3000 Mentees需要14分钟).处理后,匹配仅需几秒钟.在处理代码时,我也一遍又一遍地收到相同的INFO消息(已处理2248个用户"),同时查看代码显示该消息仅应每10000个用户输出一次.

Currently, I am using Apache Mahout for the user matching (recommender.mostSimilarIDs()). The problem I'm running into is that I have to reload the user data every single time anyone searches. By itself, this doesn't take that long, but when Mahout processes the data it seems to take a very long time (14 minutes for 3000 Mentors and 3000 Mentees). After processing, matching takes mere seconds. I also get the same INFO message over and over again while it's processing ("Processed 2248 users"), while looking at the code shows that the message should only be outputted every 10000 users.

我正在使用GenericUserBasedRecommender和GenericDataModel,以及NearestNUserNeighborhood,AveragingPreferenceInferrer和PearsonCorrelationSimilarity.我从数据库中加载指导者,将指导者添加到POJO列表中,然后将其转换为FastByIDMap以提供给DataModel.

I'm using the GenericUserBasedRecommender and the GenericDataModel, along with the NearestNUserNeighborhood, AveragingPreferenceInferrer and PearsonCorrelationSimilarity. I load mentors from the database, add the mentee to the list of POJOs and convert them to a FastByIDMap to give to the DataModel.

是否有更好的方法可以做到这一点?产品负责人需要每次搜索的数据都是最新的.

Is there a better way to be doing this? The product owner needs the data to be current for every search.

推荐答案

(我是作者.)

您不必每次都要求它重新加载数据,那是为什么呢?

You shouldn't need to ask it to reload the data every time, why's that?

14分钟听起来很长,太长时间了,无法加载如此少量的数据,这是错误的.您可以通过user@mahout.apache.org跟进更多信息.

14 minutes sounds way, way too long to load such a small amount of data too, something's wrong. You might follow up with more info at user@mahout.apache.org.

您正在查看来自DataModel的日志消息,可以在所选的日志记录系统中将其禁用.它打印一个最终计数.这没什么好担心的.

You are seeing log messages from a DataModel, which you can disable in your logging system of choice. It prints one final count. This is nothing to worry about.

除非您绝对知道要使用PreferenceInferrer,否则我建议您不要使用.您实际上在这里有评分吗?如果没有,我可能会建议LogLikelihoodSimilarity.

I would advise you against using a PreferenceInferrer unless you absolutely know you want it. Do you actually have ratings here? I might suggest LogLikelihoodSimilarity if not.

这篇关于用户与当前数据匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆