检查相等性 [英] checking for equality
问题描述
我想检查数据集的相等性.数据集看起来像这样
i want to check equality of a dataset. the data set is looking like this
Equips <- c(1,1,1,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,5,6,7,8)
Notifs <- c(10,10,20,55,63,67,71,73,73,73,81,81,83,32,32,32,32,
47,48,45,45,45,51,51,55,56,69,65,88)
Comps <- c("Motor","Ventil","Motor","Gehäuse","Ventil","Motor","Steuerung","Motor",
"Ventil","Gehäuse","Gehäuse","Ventil","Motor","Schraube","Motor","Festplatte",
"Heizgerät","Motor","Schraube","Schraube","Lichtmaschine","Bremse","Lichtmaschine",
"Schraube","Lichtmaschine","Lichtmaschine","Motor","Ventil","Schraube")
rank <- c(1,1,2,1,2,3,1,2,2,2,3,3,4,1,1,1,1,2,3,1,1,1,2,2,3,4,1,1,1)
df <- data.frame(Equips,Notifs,Comps,rank)
应该逐行读取数据帧.
我的问题如下:我有一个非常大的数据集,我想看看一个装备中的Comps是否在所有等级中都相同.
My problem is the following: I have a very big data set, and i want to take a look if the Comps in one Equips are the same in all ranks.
指定:装备 1 的等级为 1 和 2 我想比较等级 1 和等级 2 中是否列出了一个组件(在本例中:是)
To specify: Equips 1 has got rank 1 and 2 i want to compare if there is a component listed in rank 1 and rank 2 ( in this example: YES)
装备 2 有 3 个等级,这里也没有列在第一、第二和第三等级的组合.
Equips 2 hast got 3 ranks and here is, as well, no Comps which is listed in the first, second and third rank.
装备 5 有 4 个等级,是的,这里有一个在每个等级中的 Comps:即Lichtmaschine".
Equips 5 hast got 4 ranks and yes here is a Comps which is in every rank: namely "Lichtmaschine".
那么我想要的输出是什么?就足够了,如果我得到一个输出,带有装备数量,并带有 TRUE 或 FALSE(如汇总命令)
So what is my desired output? It would be enough, if i got an output, with the number of Equips, and with TRUE or FALSE(like summary command)
如果每个等级(在一个装备内)中都列出了一个 Comps,则输出应该为 TRUE
TRUE should be the output if there is a Comps which is listed in every rank (within one Equips)
还有一些注意事项:数据集非常大,所以我需要一个自动化版本,如果可能的话,只使用没有任何包的标准 R 程序.
There are also some notes: the dataset is very big so i need an automize version AND if it's possible, just with the standard R programm without any packages.
非常感谢您的努力.
查理
推荐答案
这是一个使用 plyr
包的答案:
Here is an answer which uses the plyr
package :
library(plyr)
ddply(df, .(Equips), function(d) {
nb.comps <- length(unique(d$rank))
tab <- table(d$rank, d$Comps) > 0
tab <- margin.table(tab, 2)
return(sum(tab>=nb.comps)>0)
})
给出:
Equips V1
1 1 TRUE
2 2 FALSE
3 3 FALSE
4 4 FALSE
5 5 TRUE
如果实在不想使用plyr
,可以使用by
函数:
If you really don't want to use plyr
, you can use the by
function :
by(df, df$Equips, function(d) {
nb.comps <- length(unique(d$rank))
tab <- table(d$rank, d$Comps) > 0
tab <- margin.table(tab, 2)
return(sum(tab>=nb.comps)>0)
})
df$Equips: 1
[1] TRUE
--------------------------------------------------------
df$Equips: 2
[1] FALSE
--------------------------------------------------------
df$Equips: 3
[1] FALSE
--------------------------------------------------------
df$Equips: 4
[1] FALSE
--------------------------------------------------------
df$Equips: 5
[1] TRUE
如果你想总结结果,你可以这样做:
If you want to summarize the result you can do something like this :
result <- by(df, df$Equips, function(d) {
nb.comps <- length(unique(d$Comps))
tab <- table(d$rank, d$Comps) > 0
tab <- margin.table(tab, 2)
return(sum(tab>=nb.comps)>0)
})
data.frame(nb.equips=dim(result), nb.matched=sum(result))
给出:
nb.equips nb.matched
1 5 2
这篇关于检查相等性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!