匹配2个不同数据帧之间的行值组合 [英] Match combinations of row values between 2 different data frames

查看:44
本文介绍了匹配2个不同数据帧之间的行值组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.frame,其中包含4种不同细胞标记的16种不同组合

I have a data.frame with 16 different combinations of 4 different cell markers

combinations_df

     FITC Cy3 TX_RED Cy5
 a    0   0      0   0
 b    1   0      0   0
 c    0   1      0   0
 d    1   1      0   0
 e    0   0      1   0
 f    1   0      1   0
 g    0   1      1   0
 h    1   1      1   0
 i    0   0      0   1
 j    1   0      0   1
 k    0   1      0   1
 l    1   1      0   1
 m    0   0      1   1
 n    1   0      1   1
 o    0   1      1   1
 p    1   1      1   1

我有10列和数千行的主要" data.frame.

I have my "main" data.frame with 10 columns and thousands of rows.

> main_df
  a b FITC d Cy3 f TX_RED h Cy5 j
1 0 1    1 1   1 0      1 1   1 1
2 0 1    0 1   1 0      1 0   1 1
3 1 1    0 0   0 1      1 0   0 0
4 0 1    1 1   1 0      1 1   1 1
5 0 0    0 0   0 0      0 0   0 0
....

我想使用combinations_df中所有可能的16种组合来与main_df的每一行进行比较.然后,我想创建一个新的vector,作为以后的cbindmain_df作为第11列.

I want to use all the possible 16 combinations from combinations_df to compare with each row of main_df. Then I want to create a new vector to later cbind to main_df as column 11.

样本输出

> phenotype
[1] "g" "i" "a" "p" "g" 

我考虑过要在for循环中进行while循环,以检查每个main_df行中的每个combinations_df行.

I thought about doing a while loop within a for loop checking each combinations_df row through each main_df row.

听起来像这样行得通,但是在main_df中我有接近1000000行,所以我想看看是否有人有更好的主意.

Sounds like it could work, but I have close to 1 000 000 rows in main_df, so I wanted to see if anybody had a better idea.

我忘了提及我只想将combinations_dfmain_df中的第3、5、7、9列进行比较.它们具有相同的名称,但可能并不那么明显.

I forgot to mention that I want to compare combinations_df only to columns 3,5,7,9 from main_df. They have the same name, but it might not be that obvious.

更改示例数据输出,因为不应该出现"t"

Changin the sample data output, since no "t" should be present

推荐答案

dplyr解决方案非常简单.首先,您需要将phenotype放在combinations_df中,作为一个明确的变量,如下所示:

The dplyr solution is outrageously simple. First you need to put phenotype in combinations_df as an explicit variable like this:

#   phenotype FITC Cy3 TX_RED Cy5
#1          a    0   0      0   0
#2          b    1   0      0   0
#3          c    0   1      0   0
#4          d    1   1      0   0
# etc

dplyr允许您连接多个变量,因此从这里开始查找表型是一个单一的方法.

dplyr lets you join on multiple variables, so from here it's a one-liner to look up the phenotypes.

library(dplyr)
left_join(main_df, combinations_df, by=c("FITC", "Cy3", "TX_RED", "Cy5"))

#  a b FITC d Cy3 f TX_RED h Cy5 j phenotype
#1 0 1    1 1   1 0      1 1   1 1         p
#2 0 1    0 1   1 0      1 0   1 1         o
#3 1 1    0 0   0 1      1 0   0 0         e
#4 0 1    1 1   1 0      1 1   1 1         p
#5 0 0    0 0   0 0      0 0   0 0         a

我本来以为您必须将列与tidyr::unite连接起来,但事实并非如此.

I originally thought you'd have to concatenate columns with tidyr::unite but this was not the case.

这篇关于匹配2个不同数据帧之间的行值组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆