用另一个中的相应值填充数据框 [英] Populating a data frame with corresponding values from another

查看:71
本文介绍了用另一个中的相应值填充数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含从具有独立变量A和B的实验中读取的值,该值未涵盖A和B的所有可能排列.我需要创建一个包含所有排列的数据框,其中的零数据中不存在该对特定值的地方.

I have a data frame containing values read in from an experiment with independent variables A and B which doesn't cover all possible permutations of A and B. I need to create a data frame which does contain all permutations, with zeros in those places where that particular pair of values isn't present in the data.

要创建一些示例数据,

interactions <- unique(data.frame(A = sample(1:5, 10, replace=TRUE), 
                                  B = sample(1:5, 10, replace=TRUE)))
interactions <- interactions[interactions$A < interactions$B, ]
interactions$val <- runif(nrow(interactions))

possible.interactions <- data.frame(t(combn(1:5, 2)))
names(possible.interactions) <- c('A', 'B')

创建

interactions
A B       val
1 5 0.6881106
1 2 0.5286560
2 4 0.5026426

possible.interactions
A B
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5

我想输出

A B  val
1 2  NA
1 3  0.5286560
1 4  NA
1 5  0.6881106
2 3  NA
2 4  0.5026426
2 5  NA
3 4  NA
3 5  NA
4 5  NA

最快的方法是什么?

推荐答案

以下是比merge快(〜10倍)的基本解决方案:

Here is a base solution that is much faster (~10x) than merge:

possible.interactions$val <- interactions$val[
  match(
    do.call(paste, possible.interactions),
    do.call(paste, interactions[1:2])
) ]

这会产生(请注意,与您未设置种子的b/c不同):

This produces (note, different to what you expect b/c you didn't set seed):

#    A B        val
# 1  1 2 0.59809242
# 2  1 3 0.92861520
# 3  1 4 0.64279549
# 4  1 5         NA
# 5  2 3 0.03554058
# 6  2 4         NA
# 7  2 5         NA
# 8  3 4         NA
# 9  3 5         NA
# 10 4 5         NA

这里假设A& B不包含空格,并且interactions没有重复的A-B对(将始终与第一个匹配).

This assumes A & B do not contain spaces and that interactions has no duplicate A-B pairs (will always match to first).

data.table版本:

possible.DT <- data.table(possible.interactions)
DT <- data.table(interactions, key=c("A", "B"))
DT[possible.DT]  

但这仅在表很大或具有data.table的其他优点时才值得.如果您将创建和键入表的开销包括在内,我发现在简单情况下,速度可与match媲美.我确定在某些情况下data.table会更快,特别是如果您一次键入然后多次使用该键.

Though this is only worthwhile if your tables are large or you have uses for other benefits of data.table. I've found speed to be comparable to match in simple cases if you include the overhead of creating and keying the tables. I'm sure there are cases where data.table is much faster, especially if you key once and then use that key a lot.

为完整起见,这是merge版本:

For completeness, here is the merge version:

merge(possible.interactions, interactions, all.x=T)

这篇关于用另一个中的相应值填充数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆