如何只为规则中的特定列获取LHS和RHS的物料? [英] How to get items for both LHS and RHS for only specific columns in arules?

查看:158
本文介绍了如何只为规则中的特定列获取LHS和RHS的物料?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在apriori函数中,我希望结果仅在LHS HouseOwnerFlag=0HouseOwnerFlag=1中包含这两个变量. RHS仅应包含列Product中的属性.例如:

Within the apriori function, I want the outcome to only contain these two variables in the LHS HouseOwnerFlag=0 and HouseOwnerFlag=1. The RHS should only contain attributes from the column Product. For instance:

#   lhs                   rhs                                          support confidence     lift
# 1 {HouseOwnerFlag=0}    => {Product=SV 16xDVD M360 Black}            0.2500000  0.2500000 1.000000
# 2 {HouseOwnerFlag=1}    => {Product=Adventure Works 26" 720p}        0.2500000  0.2500000 1.000000
# 3 {HouseOwnerFlag=0}    => {Product=Litware Wall Lamp E3015 Silver}  0.1666667  0.3333333 1.333333
# 4 {HouseOwnerFlag=1}    => {Product=Contoso Coffee Maker 5C E0900}   0.1666667  0.3333333 1.333333

部分答案在此问题中得到解决: R规则,仅挖掘特定列中的规则

Part of the answer is solved in this question: R arules, mine only rules from specific column

所以现在我使用以下内容:
rules <- apriori(sales, parameter=list(support =0.01, confidence =0.8, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1")))

So now I use the following:
rules <- apriori(sales, parameter=list(support =0.01, confidence =0.8, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1")))

然后,我从另一个SO问题中使用它来确保RHS上仅产品"列:
inspect( subset( rules, subset = rhs %pin% "Product=" ) )

Then I use this from that other SO question to ensure that only the Product column is on the RHS:
inspect( subset( rules, subset = rhs %pin% "Product=" ) )

结果是这样的:

#   lhs                                                                  rhs                                          support confidence     lift
# 1 {ProductKey=153, IncomeGroup=Moderate, BrandName=Adventure Works }    => {Product=SV 16xDVD M360 Black}            0.2500000  0.2500000 1.000000
# 2 {ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video }  => {Product=Adventure Works 26" 720p}        0.2500000  0.2500000 1.000000
# 3 {BrandName=Southridge Video, NumberChildrenAtHome=0 }                 => {Product=Litware Wall Lamp E3015 Silver}  0.1666667  0.3333333 1.333333
# 4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 }       => {Product=Contoso Coffee Maker 5C E0900}   0.1666667  0.3333333 1.333333

因此,很明显,LHS能够包含所有可能的列,而不仅仅是我指定的HouseOwnerFlag.从其他stackoverflow问题中,我看到可以将default="rhs"放在apriori函数中,如下所示:
rules <- apriori(sales, parameter=list(support =0.001, confidence =0.5, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), default="rhs"))

So apparently the LHS is able to contain every possible column, not just HouseOwnerFlag like I specified. From other stackoverflow questions, I see that I can put default="rhs" in the apriori function, like so:
rules <- apriori(sales, parameter=list(support =0.001, confidence =0.5, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), default="rhs"))

然后在检查时(没有子集部分,仅是inspect(rules),规则(7)比以前少得多,但实际上它在LHS中仅包含HouseOwnerFlag:

Then upon inspecting (without the subset part, just inspect(rules), there are far less rules (7) than before but it does indeed only contain HouseOwnerFlag in the LHS:

#   lhs                   rhs                           support     confidence lift
# 1 {HouseOwnerFlag=0}    => {MaritalStatus=S}          0.2500000  0.2500000   1.000000
# 2 {HouseOwnerFlag=1}    => {Gender=M}                 0.2500000  0.2500000   1.000000
# 3 {HouseOwnerFlag=0}    => {NumberChildrenAtHome=0}   0.1666667  0.3333333   1.333333
# 4 {HouseOwnerFlag=1}    => {Gender=M}   0.1666667     0.3333333  1.333333

但是,在RHS中,RHS中的产品"列中没有任何内容.因此,使用subsetinspect毫无用处,因为它会返回null.我用不同的支持号码对其进行了几次测试,以进行试验,看看是否会出现产品",但7条相同的规则保持不变.

However on the RHS there's nothing from the column Product in the RHS. So it has no use to inspect it with subset as ofcourse it would return null. I tested it several times with different support numbers to experiment and see if Product would appear or not, but the 7 same rules remain the same.

所以我的问题是,如何同时指定LHS(房屋所有者标志)和RHS(产品)?我在做什么错了?

So my question is, how can I specify both the LHS (HouseOwnerFlag) and RHS (Product)? What am I doing wrong?

您可以通过从 请注意,我只从一个巨大的数据集中获取了前20行,因此不幸的是,这里的输出不会具有与上面显示的示例相同的产品名称.但是问题仍然存在.我希望只能在LHS上获得HouseOwnerFlag=0和/或HouseOwnerFlag=1,而在RHS上获得Product列.

You can reproduce this problem by downloading this testdataset from https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0 Mind you, I only took the first 20 rows from a huge dataset, so the output here won't have the same product names as the example I displayed above unfortunately. But the problem still remains the same. I want to be able to get only HouseOwnerFlag=0and/or HouseOwnerFlag=1 on the LHS and the column Product on the RHS.

推荐答案

似乎无法一次约束lhs和rhs(在处理您的数据之前,我也没有约束).但是您可以使用子集. 我错了,您也可以一次约束lhs和rhs,请参见下文获取另一种解决方案.我保留解决方案1,因为在某些情况下,计算更大的集合然后除以左侧.

It seems that one can't constrain lhs and rhs at once (I also did not before playing with your data). But you can use subset. I was wrong, you can also constrain lhs and rhs at once, see below for another solution. I keep Solution 1 because in some cases it might be useful to compute a bigger set and then split by the left hand side.

解决方案1:

rules_sales <- apriori(sales, 
                   parameter=list(support =0.001, confidence =0.5, minlen=2, maxlen=2), 
                   appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), 
                                     default="rhs"))

rules_subset <- subset(rules_sales, (rhs %in% paste0("Product=", unique(sales$Product))))
inspect(rules_subset)

给予:

  lhs                   rhs                                                support confidence lift
1 {HouseOwnerFlag=0} => {Product=SV DVD Movies E100 Yellow}                   0.05        0.5   10
2 {HouseOwnerFlag=0} => {Product=Fabrikam Refrigerator 4.6CuFt E2800 Grey}    0.05        0.5    5
3 {HouseOwnerFlag=1} => {Product=Contoso SLR Camera M144 Gold}                0.10        0.5    5

但是您应该对自己的低支持率保持谨慎:

But you should be careful about your low support:

Warning in apriori(sales, parameter = list(support = 0.001, confidence = 0.5,  :
  You chose a very low absolute support count of 0. You might run out of memory! Increase minimum support.

解决方案2:

我对参数default的定义感到迷惑.一次使用lhs和rhs会告诉分配给其中一个的每个项目,它们只能用于lhs/rhs.参数"default"自动设置为"both"和"lhs/rhs"中未使用的所有其他项目都可同时使用(R包中实现的外观参数的说明: http://www.borgelt.net/doc/apriori/apriori .html#appearin ).您必须设置default="none",然后可以限制lhs和rhs,而无需稍后使用子集.

I was tricked by the definition of the parameter default. Using lhs and rhs at once tells each item that is assigned to one of them, that it can only be used for lhs/rhs. The parameter "default" is automatically set to "both" and all other items not used in lhs/rhs can be used for both (Explanation of the appearence parameter as implemented in the R package: http://www.inside-r.org/node/86290, I realised that it must be possible when reading the manual of the original C implementation: http://www.borgelt.net/doc/apriori/apriori.html#appearin). You have to set default="none" then you can constrain lhs and rhs without using a subset later.

rules_sales <- apriori(sales, 
                       parameter=list(support =0.001, confidence =0.5, minlen=2, maxlen=2), 
                       appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), 
                       rhs=paste0("Product=", unique(sales$Product)), default="none"))

这篇关于如何只为规则中的特定列获取LHS和RHS的物料?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆