匹配2个不同数据帧之间的行值组合 [英] Match combinations of row values between 2 different data frames
问题描述
我有一个data.frame
,其中包含4种不同细胞标记的16种不同组合
I have a data.frame
with 16 different combinations of 4 different cell markers
combinations_df
FITC Cy3 TX_RED Cy5
a 0 0 0 0
b 1 0 0 0
c 0 1 0 0
d 1 1 0 0
e 0 0 1 0
f 1 0 1 0
g 0 1 1 0
h 1 1 1 0
i 0 0 0 1
j 1 0 0 1
k 0 1 0 1
l 1 1 0 1
m 0 0 1 1
n 1 0 1 1
o 0 1 1 1
p 1 1 1 1
我有10列和数千行的主要" data.frame
.
I have my "main" data.frame
with 10 columns and thousands of rows.
> main_df
a b FITC d Cy3 f TX_RED h Cy5 j
1 0 1 1 1 1 0 1 1 1 1
2 0 1 0 1 1 0 1 0 1 1
3 1 1 0 0 0 1 1 0 0 0
4 0 1 1 1 1 0 1 1 1 1
5 0 0 0 0 0 0 0 0 0 0
....
我想使用combinations_df
中所有可能的16种组合来与main_df
的每一行进行比较.然后,我想创建一个新的vector
,作为以后的cbind
到main_df
作为第11列.
I want to use all the possible 16 combinations from combinations_df
to compare with each row of main_df
. Then I want to create a new vector
to later cbind
to main_df
as column 11.
样本输出
> phenotype
[1] "g" "i" "a" "p" "g"
我考虑过要在for循环中进行while循环,以检查每个main_df
行中的每个combinations_df
行.
I thought about doing a while loop within a for loop checking each combinations_df
row through each main_df
row.
听起来像这样行得通,但是在main_df
中我有接近1000000行,所以我想看看是否有人有更好的主意.
Sounds like it could work, but I have close to 1 000 000 rows in main_df
, so I wanted to see if anybody had a better idea.
我忘了提及我只想将combinations_df
与main_df
中的第3、5、7、9列进行比较.它们具有相同的名称,但可能并不那么明显.
I forgot to mention that I want to compare combinations_df
only to columns 3,5,7,9 from main_df
. They have the same name, but it might not be that obvious.
更改示例数据输出,因为不应该出现"t"
Changin the sample data output, since no "t" should be present
推荐答案
dplyr
解决方案非常简单.首先,您需要将phenotype
放在combinations_df
中,作为一个明确的变量,如下所示:
The dplyr
solution is outrageously simple. First you need to put phenotype
in combinations_df
as an explicit variable like this:
# phenotype FITC Cy3 TX_RED Cy5
#1 a 0 0 0 0
#2 b 1 0 0 0
#3 c 0 1 0 0
#4 d 1 1 0 0
# etc
dplyr
允许您连接多个变量,因此从这里开始查找表型是一个单一的方法.
dplyr
lets you join on multiple variables, so from here it's a one-liner to look up the phenotypes.
library(dplyr)
left_join(main_df, combinations_df, by=c("FITC", "Cy3", "TX_RED", "Cy5"))
# a b FITC d Cy3 f TX_RED h Cy5 j phenotype
#1 0 1 1 1 1 0 1 1 1 1 p
#2 0 1 0 1 1 0 1 0 1 1 o
#3 1 1 0 0 0 1 1 0 0 0 e
#4 0 1 1 1 1 0 1 1 1 1 p
#5 0 0 0 0 0 0 0 0 0 0 a
我本来以为您必须将列与tidyr::unite
连接起来,但事实并非如此.
I originally thought you'd have to concatenate columns with tidyr::unite
but this was not the case.
这篇关于匹配2个不同数据帧之间的行值组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!