R-根据多个条件匹配2个数据帧中的值(当查找ID的顺序是随机的时) [英] R - Match values from 2 dataframes based on multiple condtions (when the order of lookup IDs are random)

查看:170
本文介绍了R-根据多个条件匹配2个数据帧中的值(当查找ID的顺序是随机的时)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据帧:

df1 = data.frame(PersonId1=c(1,2,3,4,5,6,7,8,9,10,1),PersonId2=c(11,12,13,14,15,16,17,18,19,20,11),
             Played_together = c(1,0,0,1,1,0,0,0,1,0,1),
             Event=c(1,1,1,1,2,2,2,2,2,2,2),
             Utility=c(20,-2,-5,10,30,2,1,.5,50,-1,60))


df2 = data.frame(PersonId1=c(11,15,9,1),PersonId2=c(1,5,19,11),
             Played_together = c(1,1,1,1),
             Event=c(1,2,2,2))

其中df1看起来像这样:

Where df1 looks like this:

      PersonId1 PersonId2 Played_together Event Utility
1          1        11               1     1    20.0
2          2        12               0     1    -2.0
3          3        13               0     1    -5.0
4          4        14               1     1    10.0
5          5        15               1     2    30.0
6          6        16               0     2     2.0
7          7        17               0     2     1.0
8          8        18               0     2     0.5
9          9        19               1     2    50.0
10        10        20               0     2    -1.0
11         1        11               1     2    60.0

和df2看起来像这样:

and df2 looks like this:

  PersonId1 PersonId2 Played_together Event
1        11         1               1     1
2        15         5               1     2
3         9        19               1     2
4         1        11               1     2   

请注意,df2不仅仅是 df1 $ played_together == 1 。 (例如,在df2中不存在PlayerId1 = 4而在PlayerId2 = 14中。

Note that df2 is not simply df1$played_together==1. (for eg PlayerId1 = 4 and PlayerId2=14 is not present in df2.

还要注意,尽管df2是df1的子集,但个人在df2中出现的顺序例如,在第1行的 df1 中,我们看到事件1的playerid1 = 1,playerId2 = 11,但是在第1行的 df2 中,我们看到playerid1 = 11和事件1的playerId2 =1。这两种情况是完全相同的,我想从 df1 df2 查找 Utility 的值。每个事件都必须进行合并,最终输出应如下所示:

Also note that although df2 is a subset of df1, the order in which individuals appear in df2 is random. For example in df1 in row 1, we see playerid1 =1 and playerId2 = 11 for event 1. But in df2 in row 1, we see playerid1 =11 and playerId2 = 1 for event 1. These two cases are exactly same and I want to look up the values of Utility from df1 to df2. The merge has to take place for each event. The final output should look like this:

  PersonId1 PersonId2 Played_together Event Utility
1        11         1               1     1      20
2        15         5               1     2      30
3         9        19               1     2      50
4         1        11               1     2      60

我知道R中存在合并功能,但我不知道w当查询ID可以随机出现时该怎么办。如果有人可以帮助我一点,将不胜感激。

I know that a merge function exists in R, but I do not know what to do when the lookup ids can appear as random. Would appreciate it if someone could help me out a little bit. Thanks in advance.

推荐答案

这就是我为您准备的:

    library(dplyr)
    rbind(left_join(df2, df1, 
          by = c("PersonId2" = "PersonId1", "PersonId1" = "PersonId2", 
            "Played_together" = "Played_together", "Event" = "Event")),
          left_join(df2, df1, 
                     by = c("PersonId1" = "PersonId1", "PersonId2" = "PersonId2", 
         "Played_together" = "Played_together", "Event" = "Event"))) %>%
          filter(!is.na(Utility))

基本上,看来您的数据有时会翻身。我们可以将两个连接绑定在一起,然后过滤出那些实用程序为 NA 的行。

Basically it seems like your data sometimes has personid flipped. We can bind two joins together and then filter out those rows that have a utility that is NA.

您的输出看起来像这样:

Your output looks like this:

    PersonId1 PersonId2 Played_together Event Utility
1        11         1               1     1      20
2        15         5               1     2      30
3         9        19               1     2      50
4         1        11               1     2      60

这篇关于R-根据多个条件匹配2个数据帧中的值(当查找ID的顺序是随机的时)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆