使用前两列中相同的数字组合检测行，并选择第三列中具有最高数字的行 [英] Detecting rows with the same combination of numbers in the first two columns, and selecting the one with the highest number in the third column

查看：110 发布时间：2017/3/26 3:28:54 r postgresql dataframe

本文介绍了使用前两列中相同的数字组合检测行，并选择第三列中具有最高数字的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 data.frame ，只有三列，但有数千行。第一列和第二列报告数字ID，其组合表示链接（例如AB等于BA）。

I have a data.frame with only three columns but with many thousands of rows. The first and the second columns report numerical ID, and their combination indicate a link (e.g. A-B equal to B-A).

现在，我想删除所有行是链接的重复项，选择第三列中具有最高值的行。

Now, I'd like to delete all rows that are duplicates for the link, selecting the row with the highest value in the third column.

以下简单示例：

我的输入 data.frame ：

1   2    100
102 100  20000
100 102  23131
10  19 124444
10  15   1244
19  10   1242
10  19   5635
2   1    666
1   2     33
100 110     23

我的目标是获得：

100 102  23131
10  19 124444
10  15   1244
2   1    666
100 110     23

我试图在 R中找到解决方案R ，否则 postgreSQL 也会好的。
非常感谢！

I' trying to find the solution in R, otherwise postgreSQL would be fine too. Thanks a lot!

推荐答案

这个想法与此相似 。您可以使用 pmin a pmax 创建两个附加列，如下所示：

The idea is similar to this one. You can create two additional columns using pmin an pmax to group as follows:

A data.table 解决方案。但是如果你不想要data.table，那么你仍然可以使用这个想法。但是，很可能你的数据速度比data.table只有R代码的解决方案要快。

A data.table solution. But if you don't want data.table, then you can still use the idea. However, it is highly improbable you get faster than data.table solution with just R code.

# assuming your data.frame is DF
require(data.table)
DT <- data.table(DF)
# get min of V1,V2 on one column and max on other (for grouping)
DT[, `:=`(id1=pmin(V1, V2), id2=pmax(V1, V2))]
# get max of V3
DT.OUT <- DT[, .SD[which.max(V3), ], by=list(id1, id2)]
# remove the id1 and id2 columns
DT.OUT[, c("id1", "id2") := NULL]

#     V1  V2     V3
# 1:   2   1    666
# 2: 100 102  23131
# 3:  10  19 124444
# 4:  10  15   1244
# 5: 100 110     23

这篇关于使用前两列中相同的数字组合检测行，并选择第三列中具有最高数字的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用前两列中相同的数字组合检测行，并选择第三列中具有最高数字的行 [英] Detecting rows with the same combination of numbers in the first two columns, and selecting the one with the highest number in the third column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用前两列中相同的数字组合检测行，并选择第三列中具有最高数字的行 [英] Detecting rows with the same combination of numbers in the first two columns, and selecting the one with the highest number in the third column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭