评估推荐者-无法在x情况下推荐 [英] Evaluating recommenders - unable to recommend in x cases
问题描述
我正在更详细地探索Mahout in Action中的一些代码示例.我建立了一个小测试,可以计算应用于我的数据的各种算法的均方根值.
I'm exploring some of the code examples in Mahout in Action in more detail. I have built a small test that computes the RMS of various algorithms applied to my data.
当然,有多个参数会影响RMS,但我不理解运行评估时生成的在某些情况下无法推荐"消息.
Of course, multiple parameters impact the RMS, but I don't understand the "unable to recommend in ... cases" message that is generated while running an evaluation.
看StatsCallable.java,这是在评估者遇到NaN响应时生成的;训练集中或用户的偏好中可能没有足够的数据来提供建议.
Looking at StatsCallable.java, this is generated when an evaluator encounters a NaN response; Perhaps not enough data in the training set or the user's prefs to provide a recommendation.
似乎RMS分数不受大量无法推荐"案例的影响.这个假设正确吗?我是否应该不仅根据RMS评估算法,还应该评估无法推荐"案例与总体培训集的比率?
It seems like the RMS score isn't impacted by a very large set of "unable to recommend" cases. Is that assumption correct? Should I be evaluating my algorithm not only on RMS but also the ratio of "unable to recommend" cases versus my overall training set?
感谢您的反馈.
推荐答案
是的,这实际上意味着根本没有数据可作为估算依据.这通常是数据稀疏的症状.这种情况应该很少见,并且仅在数据量很小或与他人断开连接的用户中发生.
Yes this essentially means there was no data at all on which to base an estimate. That's generally a symptom of data sparseness. It should be rare, and happen only for users with data that's very small or disconnected from others'.
我个人认为这不是什么大问题,除非它是一个非常大的百分比(超过20%?).如果您根本无法为许多用户生成任何记录,我会更加担心.
I personally think it's not such a big deal unless it's a really significant percentage (20%+?) I'd worry more if you couldn't generate any recs at all for many users.
这篇关于评估推荐者-无法在x情况下推荐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!