生成测试数据集推荐系统从现有数据 [英] Generating Test Dataset For Recommendation System from existing data
问题描述
我试图建立一个使用Scala的API为Apache的火花推荐系统。
I am trying to build a recommender system using the scala API for apache-spark.
我的格式(用户,产品,等级)为所有用户评价所有项目的数据集。为了给建议,我需要的形式(U,P)与不在我的初步数据present所有U,P对的数据集。 (用户,所有产品产品对每个用户有没有买)。任何人都知道的任何直接的方式做到这一点?
I have a dataset of the form (User,Product,Rating) for all items that all the users have rated. In order to give recommendations, I need a dataset of the form (U,P) with all U,P pairs that are not present in my initial dataset . (user,product pairs for all the products that each user has not bought). anyone know of any straight forward way to do this ?
PS-可以假定没有其它的用户或物品除了在初始设置的那些。
ps- You can assume there are no other users or items apart from the ones in the initial set.
任何帮助将AP preciated。
Any help will be appreciated.
推荐答案
本教程可能对你很有帮助:
This tutorial could be very helpful for you:
的http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html
这篇关于生成测试数据集推荐系统从现有数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!