R data.table：按条件过滤多个变量中的行 [英] R data.table: Filter for rows by condition in multiple variables

查看：142 发布时间：2020/10/15 21:09:30 r data.table

本文介绍了R data.table：按条件过滤多个变量中的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对以下data.table有一个过滤器问题，并真的希望有人可以帮助我。我不确定是否有一种简单的方法可以实现，并希望要求的不是太多。所以这是我的问题：

I have a filter problem with the following data.table and really hope that someone can help me with that. I am not sure if there is an easy way of doing that and hope that it is not too much to ask for. So this is my problem:

A   B   C   Area
aa  M+H 1   127427
aa  M+H 2   204051.5
aa  M+Na    1   6855539.48777
aa  M+Na    2   6469689
bb  M+H 1   15330650
bb  M+H 2   214221
bb  M+H 3   11357158
bb  M+K 1   2140221
bb  M+K 2   61715568

对于每个AB组（aa M + H，aa M + Na，bb M + H，bb M + K），如果其Area值高于带有A的行，则应过滤出所有值C> 1的行。相同的AB组合和C值1（每个ABC组合在表中仅存在一次）。在该步骤之后，应保留以下行：

For each group A B (aa M+H, aa M+Na, bb M+H, bb M+K) all rows with a value C > 1 should be filtered out if their Area value is higher than in the row with the same A B combination and a C value 1 (each A B C combination exists only once in the table). After that step the following rows should be left:

A   B   C   Area
aa  M+H 1   127427
aa  M+Na    1   6855539.48777
aa  M+Na    2   6469689
bb  M+H 1   15330650
bb  M+H 2   214221
bb  M+H 3   11357158
bb  M+K 1   2140221

之后，我想过滤掉所有相同的AC组（aa 1，aa 2，bb 1，bb2），但是Area值高于B值为 M + H的行。因此应保留以下内容：

and after that i would like to filter out all rows which are in the same A C group (aa 1, aa 2, bb 1, bb2) but with a higher Area value than in the row with an "M+H" as B value. So this should be left:

A   B   C   Area
aa  M+H 1   127427
aa  M+Na    2   6469689
bb  M+H 1   15330650
bb  M+H 2   214221
bb  M+H 3   11357158
bb  M+K 1   2140221

最后摆脱所有AB组（aa M + H，aa M + Na，bb M + H，bb M + K），该行不剩C的值为1。因此只能是：

And in the end get rid of all A B groups (aa M+H, aa M+Na, bb M+H, bb M+K) that do not one row with a value of 1 in C left. So there should only be:

A   B   C   Area
aa  M+H 1   127427
bb  M+H 1   15330650
bb  M+H 2   214221
bb  M+H 3   11357158
bb  M+K 1   2140221

我试图使用data.table完成它，但如果有人告诉我dplyr更好，我也很乐意在那里解决。无论如何，非常感谢您的时间和精力！

I was trying to get it done using data.table but if someone tells me that dplyr is much better for it I would also be happy for a solution there. Anyway thank you a lot for your time and effort!

Yasel

推荐答案

欢迎您！

按照您的指示，我得到的结果与您的不同，但是您可能可以适应您的需求：

Following your instructions I'm coming to a different result as yours, but you might be able to adapt it to your needs:

library(data.table)

DT <- data.table(stringsAsFactors=FALSE,
                 A = c("aa", "aa", "aa", "aa", "bb", "bb", "bb", "bb", "bb"),
                 B = c("M+H", "M+H", "M+Na", "M+Na", "M+H", "M+H", "M+H", "M+K",
                       "M+K"),
                 C = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L),
                 Area = c(127427, 204051.5, 6855539.48777, 6469689, 15330650, 214221,
                          11357158, 2140221, 61715568)
)

DT <- DT[DT[C==1], on=.(A, B)][i.Area-Area > 0 | C==1]
DT[, c("i.C", "i.Area") := NULL]

DT <- DT[DT[B=="M+H"], on=.(A, C)][i.Area-Area <= 0]
DT[, c("i.B", "i.Area") := NULL]

DT <- DT[DT[C==1], on=.(A, B)]
DT[, c("i.C", "i.Area") := NULL]

这篇关于R data.table：按条件过滤多个变量中的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R data.table：按条件过滤多个变量中的行 [英] R data.table: Filter for rows by condition in multiple variables

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R data.table：按条件过滤多个变量中的行 [英] R data.table: Filter for rows by condition in multiple variables

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭