频繁项集SQL [英] Frequent itemset SQL
问题描述
我正在使用SAS完成一部分课程.目前,我有一组订单ID和产品ID.我想找出最常一起订购的产品.想一想,在杂货篮里加牛奶和谷类食品.
I'm using SAS for a piece of coursework. At the moment, I have a set of Order IDs and Product IDs. I want to found out which products are most frequently ordered together. Think, milk and cereal in a grocery basket.
我不是很擅长编程,因此,如果有人可以花一点时间并编写几行可以轻松使用的SQL,我将不胜感激.它不是一个繁重的数据集,只有两列(Order_ID和Product_ID)
I am not very good at programming, so would really appreciate if anyone could spare a bit of time and write a simple few lines of SQL I can easily use. Its not a heavy dataset and there are only two columns (Order_ID and Product_ID)
例如:
订单ID 产品ID
10001 64564564
10001 64564564
10001 546456
10001 54646
10003 5464
10003 342346
我已经花了三个小时进行研究,但有些绝望:(
I've spent three hours researching now and am a bit desperate :(
推荐答案
如果仔细考虑,您可以通过以下方式提出问题来找到答案:对于每对可能的产品,这两种产品出现了多少次以相同的顺序.然后按计数顺序将答案浮到顶部:
If you think about it, you can find the answer by asking the question this way: for every possible pair of products, how many times did the two products occur on the same order. Then order by the count to float the answer(s) to the top:
select
p1.product_id, p2.product_id, count(*) times_order_together
from
orders p1
inner join
orders p2
on
p1.order_id = p2.order_id
and
p1.product_id != p2.product_id
group by
p1.product_id, p2.product_id
order by
count(*) desc
从未一起订购的产品根本不会出现.而且-对被两次代表-一行加牛奶的鸡蛋和一行加鸡蛋的牛奶.这些重复的对是可移动的-但变得难看-简单就好.
Products that weren't ever ordered together don't show up at all. Also - pairs are represented twice - a row for eggs with milk and a row for milk with eggs. These duplicate pairs are removable - but it gets uglier - and simple is good.
为了详细说明, p1
和 p2
是订单的别名.您这样做是为了能够不止一次使用数据源-并在它们之间进行区分.另外, count(*)times_order_together
只是为计算 count(*)
指定了"times_order_together"名称.它计算的是订单中产品配对发生的次数.
To elaborate a bit, p1
and p2
are aliases of orders. You do that to be able to use a data source more than once - and yet distinguish between them. Also, the count(*) times_order_together
is just giving the name 'times_order_together' to the calculation count(*)
. It's counting the number of times a product pairing occurs in an order.
这篇关于频繁项集SQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!