如何过滤R中的数据? [英] How to filter data in R?
问题描述
我的数据如下所示:
ID Prop1 Prop2 TotalProp
56891940 G02 G02 2
56892558 A61 G02 4
56892558 A61 A61 4
56892558 G02 A61 4
56892558 A61 A61 4
56892552 B61 B61 3
56892552 B61 B61 3
56892552 B61 A61 3
56892559 B61 G61 3
56892559 B61 B61 3
56892559 B61 B61 3和所以超过百万行
我想要的是,如果所有的行ID都有56891940和56892559其中有prop1和prop2相同,但不是56892558和56892559,因为有些行是相同的,但至少有一个属性是不同的,所以我想保留56892558,56892552和56892559等所有值。
我的最终输出应该如下所示:
ID Prop1 Prop2 TotalProp
56892558 A61 G02 4
56892558 A61 A61 4
56892558 G02 A61 4
56892558 A61 A61 4
56892552 B61 B61 3
56892552 B61 B61 3
56892552 B61 A61 3
56892559 B61 G61 3
56892559 B61 C61 3
56892559 B61 B61 3
您可以尝试
library(data。表)
setDT(df1)[,.SD [any(Prop1!= Prop2)],ID]
#ID Prop1 Prop2 TotalProp
#1:56892558 A61 G02 4
#2:56892558 A61 A61 4
#3:56892558 G02 A61 4
#4:56892558 A61 A61 4
#5:56892552 B61 B61 3
#6:56892552 B61 B613
#7:56892552 B61 A61 3
#8:56892559 B61 G61 3
#9:56892559 B61 B61 3
#10:56892559 B61 B61 3
或者@Frank建议
setDT(df1)[,if(any(Prop1!= Prop2)).SD,ID]
类似的选项使用 dplyr
library(dplyr)
df1%>%
group_by(ID)%>%
filter(any(Prop1!= Prop2))
或者使用 ave
从 base R
$ $ $ $ $ $ $ $ $ $ $ $ $ c $ d $ >
I have huge data sets which contains more than millions of rows and has some peculiar attributes. I need to filter the data retaining its other properties.
My data is as like following:
ID Prop1 Prop2 TotalProp
56891940 G02 G02 2
56892558 A61 G02 4
56892558 A61 A61 4
56892558 G02 A61 4
56892558 A61 A61 4
56892552 B61 B61 3
56892552 B61 B61 3
56892552 B61 A61 3
56892559 B61 G61 3
56892559 B61 B61 3
56892559 B61 B61 3 and so on more than million rows
What I want is, I need to remove rows if all rows ID having 56891940 and 56892559 which have "prop1" and "prop2" same but not 56892558 and 56892559 because some rows are same but at least one of its properties are different so I want to retain all values from 56892558,56892552 and 56892559 and so on.
My final output should look like:
ID Prop1 Prop2 TotalProp
56892558 A61 G02 4
56892558 A61 A61 4
56892558 G02 A61 4
56892558 A61 A61 4
56892552 B61 B61 3
56892552 B61 B61 3
56892552 B61 A61 3
56892559 B61 G61 3
56892559 B61 C61 3
56892559 B61 B61 3
You may try
library(data.table)
setDT(df1)[, .SD[any(Prop1!=Prop2)], ID]
# ID Prop1 Prop2 TotalProp
# 1: 56892558 A61 G02 4
# 2: 56892558 A61 A61 4
# 3: 56892558 G02 A61 4
# 4: 56892558 A61 A61 4
# 5: 56892552 B61 B61 3
# 6: 56892552 B61 B61 3
# 7: 56892552 B61 A61 3
# 8: 56892559 B61 G61 3
# 9: 56892559 B61 B61 3
#10: 56892559 B61 B61 3
Or as @Frank suggested
setDT(df1)[, if(any(Prop1!=Prop2)) .SD, ID]
Similar option using dplyr
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(any(Prop1!=Prop2))
Or using ave
from base R
df1[with(df1, ave(Prop1!=Prop2, ID, FUN=any)),]
这篇关于如何过滤R中的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!