用另一个中的相应值填充数据框 [英] Populating a data frame with corresponding values from another
问题描述
我有一个数据框,其中包含从具有独立变量A和B的实验中读取的值,该值未涵盖A和B的所有可能排列.我需要创建一个包含所有排列的数据框,其中的零数据中不存在该对特定值的地方.
I have a data frame containing values read in from an experiment with independent variables A and B which doesn't cover all possible permutations of A and B. I need to create a data frame which does contain all permutations, with zeros in those places where that particular pair of values isn't present in the data.
要创建一些示例数据,
interactions <- unique(data.frame(A = sample(1:5, 10, replace=TRUE),
B = sample(1:5, 10, replace=TRUE)))
interactions <- interactions[interactions$A < interactions$B, ]
interactions$val <- runif(nrow(interactions))
possible.interactions <- data.frame(t(combn(1:5, 2)))
names(possible.interactions) <- c('A', 'B')
创建
interactions
A B val
1 5 0.6881106
1 2 0.5286560
2 4 0.5026426
和
possible.interactions
A B
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
我想输出
A B val
1 2 NA
1 3 0.5286560
1 4 NA
1 5 0.6881106
2 3 NA
2 4 0.5026426
2 5 NA
3 4 NA
3 5 NA
4 5 NA
最快的方法是什么?
推荐答案
以下是比merge
快(〜10倍)的基本解决方案:
Here is a base solution that is much faster (~10x) than merge
:
possible.interactions$val <- interactions$val[
match(
do.call(paste, possible.interactions),
do.call(paste, interactions[1:2])
) ]
这会产生(请注意,与您未设置种子的b/c不同):
This produces (note, different to what you expect b/c you didn't set seed):
# A B val
# 1 1 2 0.59809242
# 2 1 3 0.92861520
# 3 1 4 0.64279549
# 4 1 5 NA
# 5 2 3 0.03554058
# 6 2 4 NA
# 7 2 5 NA
# 8 3 4 NA
# 9 3 5 NA
# 10 4 5 NA
这里假设A& B不包含空格,并且interactions
没有重复的A-B
对(将始终与第一个匹配).
This assumes A & B do not contain spaces and that interactions
has no duplicate A-B
pairs (will always match to first).
和data.table
版本:
possible.DT <- data.table(possible.interactions)
DT <- data.table(interactions, key=c("A", "B"))
DT[possible.DT]
但这仅在表很大或具有data.table
的其他优点时才值得.如果您将创建和键入表的开销包括在内,我发现在简单情况下,速度可与match
媲美.我确定在某些情况下data.table
会更快,特别是如果您一次键入然后多次使用该键.
Though this is only worthwhile if your tables are large or you have uses for other benefits of data.table
. I've found speed to be comparable to match
in simple cases if you include the overhead of creating and keying the tables. I'm sure there are cases where data.table
is much faster, especially if you key once and then use that key a lot.
为完整起见,这是merge
版本:
For completeness, here is the merge
version:
merge(possible.interactions, interactions, all.x=T)
这篇关于用另一个中的相应值填充数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!