(穷人的)产品推荐实施 [英] (poor man's )product recommendation implementation

查看:138
本文介绍了(穷人的)产品推荐实施的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想建立一个穷人的推荐系统的在线商店。 我想知道,那种亚马逊购买该产品的客户还购买的功能,我读了很多关于它。 我知道有阿帕奇Mahout的事情,但我无法调整服务器的方式。那么就不会有谷歌的prediction API,但它的成本钱,所以我开始尝试自己。

I am trying to build a poor man's recommendation system for a online store. I want to realize that kind of Amazon "Customers Who Bought This Item Also Bought" feature and I read a lot about it. I know there is that Apache Mahout thing, but I am unable to tweak the server that way. Then there would be the google prediction API, but it cost money so I start experimenting myself.

我得到了一个orderhistory与250.000+项目,我写了一个嵌套的MySQL查询发现其中包含当前文章订单,排在其他订单项目和排序的表进行排序,所以我得到了一组产品,其他人订购随着当前的制品。

I got an orderhistory with 250.000+ items and I wrote a nested MySQL Query to find orders which contain the current article, rank the other order items and sort that table for ranking, so I got a set of products which other people ordered along with the current article.

的问题是,该查询可能需要长达10秒 - 所以这不能直接使用。 我想到了一个缓存表,但20分钟(有60.000产品和250.000订购的商品),所以我无法填写该表,在此之后的查询停止

The problem is, the query could take up to 10sec - so this can't be used directly. I thought about a caching table, but this query stops after 20 minutes (there are 60.000 products and 250.000 ordered items) So I am unable to fill that table.

我目前的解决方法如下: 建议HTML是通过AJAX加载ondocumentready,所以当地负载,而在后台的建议的载荷。建议数据被处理一次,并保存在一个filecache(梨简单高速缓存),所以它装入较快的下一次。因此,缓存是由需求,如果有人访问该网站并存储了一天或者一个星期。

My current workaround is the following: The recommendation HTML is loaded via AJAX ondocumentready, so the site loads, while the recommendation loads in the background. The recommendation data is processed once and stored in a filecache (PEAR simple cache) so it loads faster the next time. So the cache is made on demand if someone visits the site and stored for a day or maybe a week.

我问我自己,你,那又是一个可以接受的方法或者是愚蠢和unperformant? 会是更好的存储缓存数据的数据库或文件(我想性能和并行命中)。我的意思是,在最坏的情况下,我会endup与60.000 cachefiles。

I ask myself and you, would that be an acceptable approach or is it stupid and unperformant? Would it be better to store the cached data in a db or in file (I think about performance and parallel hits). I mean, in the worst case I would endup with 60.000 cachefiles.

我会preFER一个pre-计算表中的所有数据,但正如我说,这需要长期,我不知道如何优化它。 (等待直到SQL伙计回来的假期^^)

I would prefer a pre-computed table with all the data, but as I said it takes to long and I don't know how to optimize it. (Waiting till the SQL Dude come back from holidays ^^)

感谢您的任何提示,意见。

Thanks for any hint, opinion.

BTW。这是查询:

SELECT c.ArtNr as artnr , count(c.ArtNr) as rank, s.ArtNr as parent_artnr
FROM (
SELECT a.ID_order, a.ArtNr
        FROM net_orderposition a
        WHERE a.ArtNr = 'TT-PV0005'
) s
JOIN net_orderposition c 
WHERE s.ID_order = c.ID_order AND s.ArtNr != c.ArtNr
GROUP BY c.ArtNr
ORDER BY rank DESC,c.Stamp DESC
LIMIT 10;

编辑:

我想到了给定的答案,我认为他们是类似于我最初的想法。 上述code结果如下表所示:

I thought about the given answers and I think they are similar to my initial idea. The above code result in the following table:

ID,ParentID , ChildID  , Rank
1, TT-PV0005, TT-PV0040, 220
2, TT-PV0005, TT-PV0355, 135
3, TT-PV0005, TT-PV0450, 134
4, TT-PV0005, TT-PV0451, 89
5, TT-PV0005, RH-01V2  , 83
6, TT-PV0005, TT-PV0041, 83
7, TT-PV0005, TT-PV0353, 82
8, TT-PV0005, TT-PV0037, 80

该PARENTID是当前项目,ChildID,在与PARENTID以往订购的物品,等级是如何往往孩子排序,当前项目的precomputed计数。 现在,我可以更新或插入每一个新的秩序相关物项和计数等级,如果它已经美元的DB p $ psent。 我担心的唯一的事情,我会endup在一个非常非常大的表。 也许这不应该是一个问题,如果我是线下每周一次的precalculate? 但我必须优化查询,以便它没有考虑每件10秒。

The ParentID is the current item, ChildID the items that ordered in the past along with ParentID, Rank is the precomputed count of how often the child is ordered with current item. Now I can UPDATE or INSERT related items on every new order and count up Rank if it's already present in DB. The only thing I fear, I will endup in a really really big table. Maybe it shouldn't be a problem, if I precalculate it offline once a week? But then I have to optimize the query so it doesn't take 10 sec per item.

你怎么看?

推荐答案

看看 easyrec 它的功能,你需要并且是免费的。无调整需要,您可以使用演示实例,像谷歌分析。我认为这将是非常容易,只需使用这个免费使用的网络服务的话,code对自己的整个逻辑。

check out easyrec it has the features you need and is free. no tweaking needed and you can use the Demo instance like google analytics. I think it will be much easier to just use this free to use web service then code the whole logic on your own.

在一个鸣叫今天他们提到,他们支持全象夫的支持easyrec所以你整件事与easyrec.You既可以使用easyrec是免费的web服务或部署在您的网络服务器免费WAR文件。

In a tweet today they mentioned that they support full mahout support to easyrec so you have the whole thing with easyrec.You can either use easyrec's free webservice or deploy the free WAR file on your webserver.

这篇关于(穷人的)产品推荐实施的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆