R data.table:按条件过滤多个变量中的行 [英] R data.table: Filter for rows by condition in multiple variables
问题描述
我对以下data.table有一个过滤器问题,并真的希望有人可以帮助我。我不确定是否有一种简单的方法可以实现,并希望要求的不是太多。所以这是我的问题:
I have a filter problem with the following data.table and really hope that someone can help me with that. I am not sure if there is an easy way of doing that and hope that it is not too much to ask for. So this is my problem:
A B C Area
aa M+H 1 127427
aa M+H 2 204051.5
aa M+Na 1 6855539.48777
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
bb M+K 2 61715568
对于每个AB组(aa M + H,aa M + Na,bb M + H,bb M + K),如果其Area值高于带有A的行,则应过滤出所有值C> 1的行。相同的AB组合和C值1(每个ABC组合在表中仅存在一次)。在该步骤之后,应保留以下行:
For each group A B (aa M+H, aa M+Na, bb M+H, bb M+K) all rows with a value C > 1 should be filtered out if their Area value is higher than in the row with the same A B combination and a C value 1 (each A B C combination exists only once in the table). After that step the following rows should be left:
A B C Area
aa M+H 1 127427
aa M+Na 1 6855539.48777
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
之后,我想过滤掉所有相同的AC组(aa 1,aa 2,bb 1,bb2),但是Area值高于B值为 M + H的行。因此应保留以下内容:
and after that i would like to filter out all rows which are in the same A C group (aa 1, aa 2, bb 1, bb2) but with a higher Area value than in the row with an "M+H" as B value. So this should be left:
A B C Area
aa M+H 1 127427
aa M+Na 2 6469689
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
最后摆脱所有AB组(aa M + H,aa M + Na,bb M + H,bb M + K),该行不剩C的值为1。因此只能是:
And in the end get rid of all A B groups (aa M+H, aa M+Na, bb M+H, bb M+K) that do not one row with a value of 1 in C left. So there should only be:
A B C Area
aa M+H 1 127427
bb M+H 1 15330650
bb M+H 2 214221
bb M+H 3 11357158
bb M+K 1 2140221
我试图使用data.table完成它,但如果有人告诉我dplyr更好,我也很乐意在那里解决。无论如何,非常感谢您的时间和精力!
I was trying to get it done using data.table but if someone tells me that dplyr is much better for it I would also be happy for a solution there. Anyway thank you a lot for your time and effort!
Yasel
推荐答案
欢迎您!
按照您的指示,我得到的结果与您的不同,但是您可能可以适应您的需求:
Following your instructions I'm coming to a different result as yours, but you might be able to adapt it to your needs:
library(data.table)
DT <- data.table(stringsAsFactors=FALSE,
A = c("aa", "aa", "aa", "aa", "bb", "bb", "bb", "bb", "bb"),
B = c("M+H", "M+H", "M+Na", "M+Na", "M+H", "M+H", "M+H", "M+K",
"M+K"),
C = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L),
Area = c(127427, 204051.5, 6855539.48777, 6469689, 15330650, 214221,
11357158, 2140221, 61715568)
)
DT <- DT[DT[C==1], on=.(A, B)][i.Area-Area > 0 | C==1]
DT[, c("i.C", "i.Area") := NULL]
DT <- DT[DT[B=="M+H"], on=.(A, C)][i.Area-Area <= 0]
DT[, c("i.B", "i.Area") := NULL]
DT <- DT[DT[C==1], on=.(A, B)]
DT[, c("i.C", "i.Area") := NULL]
这篇关于R data.table:按条件过滤多个变量中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!