R data.table:按条件过滤多个变量中的行 [英] R data.table: Filter for rows by condition in multiple variables

查看:142
本文介绍了R data.table:按条件过滤多个变量中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对以下data.table有一个过滤器问题,并真的希望有人可以帮助我。我不确定是否有一种简单的方法可以实现,并希望要求的不是太多。所以这是我的问题:

I have a filter problem with the following data.table and really hope that someone can help me with that. I am not sure if there is an easy way of doing that and hope that it is not too much to ask for. So this is my problem:

A   B   C   Area
aa  M+H 1   127427
aa  M+H 2   204051.5
aa  M+Na    1   6855539.48777
aa  M+Na    2   6469689
bb  M+H 1   15330650
bb  M+H 2   214221
bb  M+H 3   11357158
bb  M+K 1   2140221
bb  M+K 2   61715568

对于每个AB组(aa M + H,aa M + Na,bb M + H,bb M + K),如果其Area值高于带有A的行,则应过滤出所有值C> 1的行。相同的AB组合和C值1(每个ABC组合在表中仅存在一次)。在该步骤之后,应保留以下行:

For each group A B (aa M+H, aa M+Na, bb M+H, bb M+K) all rows with a value C > 1 should be filtered out if their Area value is higher than in the row with the same A B combination and a C value 1 (each A B C combination exists only once in the table). After that step the following rows should be left:

A   B   C   Area
aa  M+H 1   127427
aa  M+Na    1   6855539.48777
aa  M+Na    2   6469689
bb  M+H 1   15330650
bb  M+H 2   214221
bb  M+H 3   11357158
bb  M+K 1   2140221

之后,我想过滤掉所有相同的AC组(aa 1,aa 2,bb 1,bb2),但是Area值高于B值为 M + H的行。因此应保留以下内容:

and after that i would like to filter out all rows which are in the same A C group (aa 1, aa 2, bb 1, bb2) but with a higher Area value than in the row with an "M+H" as B value. So this should be left:

A   B   C   Area
aa  M+H 1   127427
aa  M+Na    2   6469689
bb  M+H 1   15330650
bb  M+H 2   214221
bb  M+H 3   11357158
bb  M+K 1   2140221

最后摆脱所有AB组(aa M + H,aa M + Na,bb M + H,bb M + K),该行不剩C的值为1。因此只能是:

And in the end get rid of all A B groups (aa M+H, aa M+Na, bb M+H, bb M+K) that do not one row with a value of 1 in C left. So there should only be:

A   B   C   Area
aa  M+H 1   127427
bb  M+H 1   15330650
bb  M+H 2   214221
bb  M+H 3   11357158
bb  M+K 1   2140221

我试图使用data.table完成它,但如果有人告诉我dplyr更好,我也很乐意在那里解决。无论如何,非常感谢您的时间和精力!

I was trying to get it done using data.table but if someone tells me that dplyr is much better for it I would also be happy for a solution there. Anyway thank you a lot for your time and effort!

Yasel

推荐答案

欢迎您!

按照您的指示,我得到的结果与您的不同,但是您可能可以适应您的需求:

Following your instructions I'm coming to a different result as yours, but you might be able to adapt it to your needs:

library(data.table)

DT <- data.table(stringsAsFactors=FALSE,
                 A = c("aa", "aa", "aa", "aa", "bb", "bb", "bb", "bb", "bb"),
                 B = c("M+H", "M+H", "M+Na", "M+Na", "M+H", "M+H", "M+H", "M+K",
                       "M+K"),
                 C = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L),
                 Area = c(127427, 204051.5, 6855539.48777, 6469689, 15330650, 214221,
                          11357158, 2140221, 61715568)
)

DT <- DT[DT[C==1], on=.(A, B)][i.Area-Area > 0 | C==1]
DT[, c("i.C", "i.Area") := NULL]

DT <- DT[DT[B=="M+H"], on=.(A, C)][i.Area-Area <= 0]
DT[, c("i.B", "i.Area") := NULL]

DT <- DT[DT[C==1], on=.(A, B)]
DT[, c("i.C", "i.Area") := NULL]

这篇关于R data.table:按条件过滤多个变量中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆