检查相等性 [英] checking for equality

查看:30
本文介绍了检查相等性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想检查数据集的相等性.数据集看起来像这样

i want to check equality of a dataset. the data set is looking like this

Equips <- c(1,1,1,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,5,6,7,8)
Notifs <- c(10,10,20,55,63,67,71,73,73,73,81,81,83,32,32,32,32,
47,48,45,45,45,51,51,55,56,69,65,88)
Comps <- c("Motor","Ventil","Motor","Gehäuse","Ventil","Motor","Steuerung","Motor",
"Ventil","Gehäuse","Gehäuse","Ventil","Motor","Schraube","Motor","Festplatte",
"Heizgerät","Motor","Schraube","Schraube","Lichtmaschine","Bremse","Lichtmaschine",
"Schraube","Lichtmaschine","Lichtmaschine","Motor","Ventil","Schraube")
rank <- c(1,1,2,1,2,3,1,2,2,2,3,3,4,1,1,1,1,2,3,1,1,1,2,2,3,4,1,1,1)

df <- data.frame(Equips,Notifs,Comps,rank)

应该逐行读取数据帧.

我的问题如下:我有一个非常大的数据集,我想看看一个装备中的Comps是否在所有等级中都相同.

My problem is the following: I have a very big data set, and i want to take a look if the Comps in one Equips are the same in all ranks.

指定:装备 1 的等级为 1 和 2 我想比较等级 1 和等级 2 中是否列出了一个组件(在本例中:是)

To specify: Equips 1 has got rank 1 and 2 i want to compare if there is a component listed in rank 1 and rank 2 ( in this example: YES)

装备 2 有 3 个等级,这里也没有列在第一、第二和第三等级的组合.

Equips 2 hast got 3 ranks and here is, as well, no Comps which is listed in the first, second and third rank.

装备 5 有 4 个等级,是的,这里有一个在每个等级中的 Comps:即Lichtmaschine".

Equips 5 hast got 4 ranks and yes here is a Comps which is in every rank: namely "Lichtmaschine".

那么我想要的输出是什么?就足够了,如果我得到一个输出,带有装备数量,并带有 TRUE 或 FALSE(如汇总命令)

So what is my desired output? It would be enough, if i got an output, with the number of Equips, and with TRUE or FALSE(like summary command)

如果每个等级(在一个装备内)中都列出了一个 Comps,则输出应该为 TRUE

TRUE should be the output if there is a Comps which is listed in every rank (within one Equips)

还有一些注意事项:数据集非常大,所以我需要一个自动化版本,如果可能的话,只使用没有任何包的标准 R 程序.

There are also some notes: the dataset is very big so i need an automize version AND if it's possible, just with the standard R programm without any packages.

非常感谢您的努力.

查理

推荐答案

这是一个使用 plyr 包的答案:

Here is an answer which uses the plyrpackage :

library(plyr)
ddply(df, .(Equips), function(d) {
  nb.comps <- length(unique(d$rank))
  tab <- table(d$rank, d$Comps) > 0
  tab <- margin.table(tab, 2)
  return(sum(tab>=nb.comps)>0)
})

给出:

  Equips    V1
1      1  TRUE
2      2 FALSE
3      3 FALSE
4      4 FALSE
5      5  TRUE

如果实在不想使用plyr,可以使用by函数:

If you really don't want to use plyr, you can use the by function :

by(df, df$Equips, function(d) {
  nb.comps <- length(unique(d$rank))
  tab <- table(d$rank, d$Comps) > 0
  tab <- margin.table(tab, 2)
  return(sum(tab>=nb.comps)>0)
})

df$Equips: 1
[1] TRUE
-------------------------------------------------------- 
df$Equips: 2
[1] FALSE
-------------------------------------------------------- 
df$Equips: 3
[1] FALSE
-------------------------------------------------------- 
df$Equips: 4
[1] FALSE
-------------------------------------------------------- 
df$Equips: 5
[1] TRUE

如果你想总结结果,你可以这样做:

If you want to summarize the result you can do something like this :

result <- by(df, df$Equips, function(d) {
  nb.comps <- length(unique(d$Comps))
  tab <- table(d$rank, d$Comps) > 0
  tab <- margin.table(tab, 2)
  return(sum(tab>=nb.comps)>0)
})


data.frame(nb.equips=dim(result), nb.matched=sum(result))

给出:

  nb.equips nb.matched
1         5          2

这篇关于检查相等性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆