使用SQL查询(模糊先验算法)进行数据挖掘操​​作-使用SQL对其进行编码 [英] Data Mining Operation using SQL Query (Fuzzy Apriori Algorithm) - Coding it using SQL

查看:68
本文介绍了使用SQL查询(模糊先验算法)进行数据挖掘操​​作-使用SQL对其进行编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有这张桌子:

Trans_ID    Name    Fuzzy_Value    Total_Item  
100          I1  0.33333333        3  
100          I2  0.33333333        3  
100          I5  0.33333333        3  
200          I2  0.5               2  
200          I5  0.5               2  
300          I2  0.5               2  
300          I3  0.5               2  
400          I1  0.33333333        3  
400          I2  0.33333333        3  
400          I4  0.33333333        3  
500          I1  0.5               2  
500          I3  0.5               2  
600          I2  0.5               2  
600          I3  0.5               2  
700          I1  0.5               2  
700          I3  0.5               2  
800          I1  0.25              4  
800          I2  0.25              4  
800          I3  0.25              4  
800          I5  0.25              4  
900          I1  0.33333333        3  
900          I2  0.33333333        3  
900          I3  0.33333333        3  
1000         I1  0.2               5  
1000         I2  0.2               5  
1000         I4  0.2               5  
1000         I6  0.2               5  
1000         I8  0.2               5  

和两个空白表:

Table  ITEMSET
  
"ITEM_SET" "Support" 



Table Confidence
  
"ANTECEDENT" "CONSEQUENT" 


我需要为每笔交易中出现的每个项目找到FUZZY值:


I need to find FUZZY value for each item that occurs in each transaction:

I1 = Sum of (Fuzzy_Value from item I1 in trans 100 until 1000 which is trans: 100,400,500,700,800,900,1000)/Total Trans  
-> (.33333333+0.33333333+0.5+0.5+0.25+0.33333333+0.2)/10 = 0.244999999

  
I2 = Sum of (Fuzzy_Value from item I2 in trans 100 - 1000 which is trans:100,200,300,400,600,800,900,1000)/Total Trans  
-> (0.33333333+0.5+0.5+0.33333333+0.5+0.25+0.33333333)/10 = 0.274999999


I3 -> 0.258333333  
I4 -> 0.103333333  
I5 -> 0.058333333    
I6 -> 0.02    
I8 -> 0.02    

例如,我使用最低支持10%->0.1
我需要删除I5,I6,I8,因为它的值<0.1 =>修剪步骤

E.g., I use minimum Support 10% -> 0.1
I need to remove I5,I6,I8 since it's value < 0.1 => prune step

然后存储:

I1=0.244999999, I2=0.274999999, I3=0.258333333,I4=0.103333333  on new table 'ITEMSET' 


2个组合

注意:这是最可能需要使用重复或递归操作的第一步,因为此过程将一直进行下去,直到无法进行其他任何项组合为止
然后从剩下的东西中找到K + 1个项目集(这是2个组合项目集)=>加入步骤

{I1,I2} =Sum of (Fuzzy_Value from item I1 + I2 in trans 100 - 1000 which is trans:100,400,800,900,1000)/Total Trans 
->(0.666666667+0.666666667+0.5+0.666666667+0.4)/9 = 0.29

*do the same for the rest*
{I1,I3} =(1+1+0.5+0.666666667)/9 = 0.316666667
{I1,I4} =(0.666666667+0.4)/9 = 0.106666667
{I2,I3} =(1+1+0.5+0.666666667)/9 = 0.316666667
{I2,I4} =(1+0.666666667+0.4)/9 =0.206666667
{I3,I4} =0  

然后再执行一次修剪步骤,删除小于0.1的值,即{I3,I4}

Then Do another Prune Step removing less than 0.1 value which is {I3,I4}

Store {I1,I2} = 0.29, {I1,I3} = 0.316666667, {I1,I4} =0.106666667, {I2,I3} = 0.316666667, {I2,I4} = 0.206666667  AT "ITEMSET" TABLE 


3个组合

此后,再次进行 JOIN STEP 合并通过修剪的项集

After that Do another JOIN STEP combining itemset that pass pruning

{I1,I2,I3} = Sum of (Fuzzy_Value from item I1 + I2 +I3 in trans 100 - 1000 which is trans:800,900)/Total Trans  
-> 0.75+1 = 0.175  
**Same for the rest**  
{I1,I2,I4} = 1+0.6 = 0.16  
{I2,I3,I4} = 0  

再执行一次修剪步骤,删除小于0.1的值,即{I1,I3,I4}

Do another Prune Step removing less than 0.1 value which is {I1,I3,I4}

Store {I1,I2,I3} = 0.176 AND {I1,I2,I4} = 0,16 AT "ITEMSET" TABLE  


4个组合

通过修剪K + 4(4个组合)的组合项集

Combine itemset that pass pruning K+4 (4 combination)

{I1,I2,I3,I4} = 0

**因为没有交易包含该物品

**since no transaction containing this item

在处理停止后,因为没有可能的组合了

after process stop since there's no possible combination left

此时,ITEMSET数据库具有:

At this point, ITEMSET database have:

ITEM_SET           Support  
{I1}               0.244999999
{I2}               0.274999999     
{I3}               0.258333333    
{I4}               0.103333333  
{I1,I2}            0.29    
{I1,I3}            0.316666667  
{I1,I4}            0.106666667  
{I2,I3}            0.316666667  
{I2,I4}            0.206666667  
{I1,I2,I3}         0.176  
{I1,I2,I4}         0,16  


如何在sql中编写代码?非常感谢你!


How do I code that in sql? Thank you very much!

注意:您可以根据需要添加另一个表.

Note: You can add another table as needed.

推荐答案

步骤1:

CREATE TABLE ITEMSET
SELECT Name, SUM(Fuzzy_Value)/COUNT(*) Fuzzy_Value
FROM trans
GROUP BY ID
HAVING ROUND(SUM(Fuzzy_Value), 1) >= 0.1

请注意 ROUND()函数-这很重要,因为您的值.33333之类的值不能以合计的方式累加.

Note the ROUND() function - it's important, because you have values like .33333 that don't sum in a happy way.

第2步:

ALTER TABLE ITEMSET ADD INDEX (Name)

SELECT a.Name Name1, b.Name Name2, SUM(Fuzzy_Value)/COUNT(*) Fuzzy_Value
FROM ITEMSET a JOIN ITEMSET b ON (a.Name != b.Name)
GROUP BY a.Name, b.Name
HAVING ROUND(SUM(Fuzzy_Value), 1) >= 0.1

Opps:我刚刚注意到您是在半年前问过的,所以我想继续下去是没有意义的.如果您仍然需要此答案,请发表评论.

Opps: I just noticed that you asked this half a year ago, so I guess there is no point in continuing. If you still need this answer leave a comment.

这篇关于使用SQL查询(模糊先验算法)进行数据挖掘操​​作-使用SQL对其进行编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆