频繁项集SQL [英] Frequent itemset SQL

查看:90
本文介绍了频繁项集SQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SAS完成一部分课程.目前,我有一组订单ID和产品ID.我想找出最常一起订购的产品.想一想,在杂货篮里加牛奶和谷类食品.

I'm using SAS for a piece of coursework. At the moment, I have a set of Order IDs and Product IDs. I want to found out which products are most frequently ordered together. Think, milk and cereal in a grocery basket.

我不是很擅长编程,因此,如果有人可以花一点时间并编写几行可以轻松使用的SQL,我将不胜感激.它不是一个繁重的数据集,只有两列(Order_ID和Product_ID)

I am not very good at programming, so would really appreciate if anyone could spare a bit of time and write a simple few lines of SQL I can easily use. Its not a heavy dataset and there are only two columns (Order_ID and Product_ID)

例如:

订单ID 产品ID

10001 64564564

10001 64564564

10001 546456

10001 54646

10003 5464

10003 342346

我已经花了三个小时进行研究,但有些绝望:(

I've spent three hours researching now and am a bit desperate :(

推荐答案

如果仔细考虑,您可以通过以下方式提出问题来找到答案:对于每对可能的产品,这两种产品出现了多少次以相同的顺序.然后按计数顺序将答案浮到顶部:

If you think about it, you can find the answer by asking the question this way: for every possible pair of products, how many times did the two products occur on the same order. Then order by the count to float the answer(s) to the top:

select 
    p1.product_id, p2.product_id, count(*) times_order_together 
from
    orders p1
    inner join
    orders p2
    on
        p1.order_id = p2.order_id
        and
        p1.product_id != p2.product_id 
group by
    p1.product_id, p2.product_id
order by
    count(*) desc 

从未一起订购的产品根本不会出现.而且-对被两次代表-一行加牛奶的鸡蛋和一行加鸡蛋的牛奶.这些重复的对是可移动的-但变得难看-简单就好.

Products that weren't ever ordered together don't show up at all. Also - pairs are represented twice - a row for eggs with milk and a row for milk with eggs. These duplicate pairs are removable - but it gets uglier - and simple is good.

为了详细说明, p1 p2 是订单的别名.您这样做是为了能够不止一次使用数据源-并在它们之间进行区分.另外, count(*)times_order_together 只是为计算 count(*)指定了"times_order_together"名称.它计算的是订单中产品配对发生的次数.

To elaborate a bit, p1 and p2 are aliases of orders. You do that to be able to use a data source more than once - and yet distinguish between them. Also, the count(*) times_order_together is just giving the name 'times_order_together' to the calculation count(*). It's counting the number of times a product pairing occurs in an order.

这篇关于频繁项集SQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆