匹配2个不同数据帧之间的行值组合 [英] Match combinations of row values between 2 different data frames

查看：44 发布时间：2020/5/4 5:18:36 r loops dataframe pattern-matching

本文介绍了匹配2个不同数据帧之间的行值组合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个data.frame，其中包含4种不同细胞标记的16种不同组合

I have a data.frame with 16 different combinations of 4 different cell markers

combinations_df

     FITC Cy3 TX_RED Cy5
 a    0   0      0   0
 b    1   0      0   0
 c    0   1      0   0
 d    1   1      0   0
 e    0   0      1   0
 f    1   0      1   0
 g    0   1      1   0
 h    1   1      1   0
 i    0   0      0   1
 j    1   0      0   1
 k    0   1      0   1
 l    1   1      0   1
 m    0   0      1   1
 n    1   0      1   1
 o    0   1      1   1
 p    1   1      1   1

我有10列和数千行的主要" data.frame.

I have my "main" data.frame with 10 columns and thousands of rows.

> main_df
  a b FITC d Cy3 f TX_RED h Cy5 j
1 0 1    1 1   1 0      1 1   1 1
2 0 1    0 1   1 0      1 0   1 1
3 1 1    0 0   0 1      1 0   0 0
4 0 1    1 1   1 0      1 1   1 1
5 0 0    0 0   0 0      0 0   0 0
....

我想使用combinations_df中所有可能的16种组合来与main_df的每一行进行比较.然后，我想创建一个新的vector，作为以后的cbind到main_df作为第11列.

I want to use all the possible 16 combinations from combinations_df to compare with each row of main_df. Then I want to create a new vector to later cbind to main_df as column 11.

样本输出

> phenotype
[1] "g" "i" "a" "p" "g"

我考虑过要在for循环中进行while循环，以检查每个main_df行中的每个combinations_df行.

I thought about doing a while loop within a for loop checking each combinations_df row through each main_df row.

听起来像这样行得通，但是在main_df中我有接近1000000行，所以我想看看是否有人有更好的主意.

Sounds like it could work, but I have close to 1 000 000 rows in main_df, so I wanted to see if anybody had a better idea.

我忘了提及我只想将combinations_df与main_df中的第3、5、7、9列进行比较.它们具有相同的名称，但可能并不那么明显.

I forgot to mention that I want to compare combinations_df only to columns 3,5,7,9 from main_df. They have the same name, but it might not be that obvious.

更改示例数据输出，因为不应该出现"t"

Changin the sample data output, since no "t" should be present

推荐答案

dplyr解决方案非常简单.首先，您需要将phenotype放在combinations_df中，作为一个明确的变量，如下所示:

The dplyr solution is outrageously simple. First you need to put phenotype in combinations_df as an explicit variable like this:

#   phenotype FITC Cy3 TX_RED Cy5
#1          a    0   0      0   0
#2          b    1   0      0   0
#3          c    0   1      0   0
#4          d    1   1      0   0
# etc

dplyr允许您连接多个变量，因此从这里开始查找表型是一个单一的方法.

dplyr lets you join on multiple variables, so from here it's a one-liner to look up the phenotypes.

library(dplyr)
left_join(main_df, combinations_df, by=c("FITC", "Cy3", "TX_RED", "Cy5"))

#  a b FITC d Cy3 f TX_RED h Cy5 j phenotype
#1 0 1    1 1   1 0      1 1   1 1         p
#2 0 1    0 1   1 0      1 0   1 1         o
#3 1 1    0 0   0 1      1 0   0 0         e
#4 0 1    1 1   1 0      1 1   1 1         p
#5 0 0    0 0   0 0      0 0   0 0         a

我本来以为您必须将列与tidyr::unite连接起来，但事实并非如此.

I originally thought you'd have to concatenate columns with tidyr::unite but this was not the case.

这篇关于匹配2个不同数据帧之间的行值组合的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

匹配2个不同数据帧之间的行值组合 [英] Match combinations of row values between 2 different data frames

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

匹配2个不同数据帧之间的行值组合 [英] Match combinations of row values between 2 different data frames

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭