合并具有多个匹配项的数据框时仅选择第一行 [英] Select only the first row when merging data frames with multiple matches

查看:15
本文介绍了合并具有多个匹配项的数据框时仅选择第一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框,data"和scores",想将它们合并到id"列:

I have two data frames, "data" and "scores", and want to merge them on the "id" column:

data = data.frame(id = c(1,2,3,4,5),
                  state = c("KS","MN","AL","FL","CA"))
scores = data.frame(id = c(1,1,1,2,2,3,3,3),
                    score = c(66,75,78,86,85,76,75,90))
merge(data, scores, by = "id")                  
semi_join(data, scores, by = "id")                  

在scores"数据中,有多个观察值的id",其中每个匹配在连接之后获得一行.见 ?merge:

In the "scores" data, there are "id" with multiple observations, where each match gets a row following the join. See ?merge:

如果有多个匹配项,所有可能的匹配项各贡献一行.

If there is more than one match, all possible matches contribute one row each.

但是,我只想保留与 scores 表中的 first 匹配对应的行.

However, I want keep only the row corresponding to the first match from the scores table.

半连接本来不错,但我无法从正确的表格中选择分数.

A semi join would have been nice, but I'm not able to select the score from the right table.

有什么建议吗?

推荐答案

使用 data.table 以及 mult = "first"nomatch = 0L:

Using data.table along with mult = "first" and nomatch = 0L:

require(data.table)
setDT(scores); setDT(data) # convert to data.tables by reference

scores[data, mult = "first", on = "id", nomatch=0L]
#    id score state
# 1:  1    66    KS
# 2:  2    86    MN
# 3:  3    76    AL

对于dataid列上的每一行,scores'id列中的匹配行是找到,并且只保留第一个(因为 mult = "first").如果没有匹配项,它们将被删除(因为 nomatch = 0L).

For each row on data's id column, the matching rows in scores' id column are found, and the first one alone is retained (because mult = "first"). If there are no matches, they're removed (because of nomatch = 0L).

这篇关于合并具有多个匹配项的数据框时仅选择第一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆