如何使用ddply计算任意列数的内积? [英] How can I calculate an inner product with an arbitrary number of columns using ddply?
问题描述
我想对具有给定数组 W
的数据框中的每一行执行第一个D列的内积。我正在尝试以下内容:
I want to perform an inner product of the first D columns for each row in a data frame with a given array, W
. I am trying the following:
W = (1,2,3);
ddply(df, .(id), transform, inner_product=c(col1, col2, col3) %*% W);
这可以,但是我通常可能有任意数量的列。可以推广上述表达式来处理这种情况吗?
This works but I typically may have an arbitrary number of columns. Can I generalize the above expression to handle that case?
更新:
这是一个更新的示例在评论中:
This is an updated example as asked for in the comments:
libary(kernlab);
data(spam);
W = array();
W[1:3] = seq(1,3);
spamdf = head(spam);
spamdf$id = seq(1,nrow(spamdf));
df_out=ddply(spamdf, .(id), transform, inner_product=c(make, address, all) %*% W);
> W
[1] 1 2 3
> spamdf[1,]
make address all num3d our over remove internet order mail receive will
1 0 0.64 0.64 0 0.32 0 0 0 0 0 0 0.64
people report addresses free business email you credit your font num000
1 0 0 0 0.32 0 1.29 1.93 0 0.96 0 0
money hp hpl george num650 lab labs telnet num857 data num415 num85
1 0 0 0 0 0 0 0 0 0 0 0 0
technology num1999 parts pm direct cs meeting original project re edu table
1 0 0 0 0 0 0 0 0 0 0 0 0
conference charSemicolon charRoundbracket charSquarebracket charExclamation
1 0 0 0 0 0.778
charDollar charHash capitalAve capitalLong capitalTotal type id
1 0 0 3.756 61 278 spam 1
> df_out[1,]
make address all num3d our over remove internet order mail receive will
1 0 0.64 0.64 0 0.32 0 0 0 0 0 0 0.64
people report addresses free business email you credit your font num000
1 0 0 0 0.32 0 1.29 1.93 0 0.96 0 0
money hp hpl george num650 lab labs telnet num857 data num415 num85
1 0 0 0 0 0 0 0 0 0 0 0 0
technology num1999 parts pm direct cs meeting original project re edu table
1 0 0 0 0 0 0 0 0 0 0 0 0
conference charSemicolon charRoundbracket charSquarebracket charExclamation
1 0 0 0 0 0.778
charDollar charHash capitalAve capitalLong capitalTotal type id inner_product
1 0 0 3.756 61 278 spam 1 3.2
上述示例使用数组 W =(1,2,3)$ c $执行前三维的内积c>在 kernlab 包中提供的垃圾邮件数据集。在这里我明确指出了前三个维度为
c(make,address,all)
。
因此 df_out [1,inner_product] = 3.2
。
The above example performs a inner product of the first three dimensions with an array W=(1,2,3)
of the spam data set available in kernlab package. Here I have explicity specified the first three dimensions as c(make, address, all)
.
Thus df_out[1,"inner_product"] = 3.2
.
而是我要执行内部产品的所有尺寸,而不必列出所有的尺寸。转换为矩阵并返回数据框架似乎是一项昂贵的操作?
Instead I want to perform the inner product over all the dimensions without having to list all the dimensions. The conversion to a matrix and back to a data frame seems to be an expensive operation?
推荐答案
以下应该工作:
- 将每个块转换为矩阵
- 执行矩阵乘法
- 将结果转换为data.frame
代码:
set.seed(1)
df <- data.frame(
id=sample(1:5, 20, replace=TRUE),
col1 = runif(20),
col2 = runif(20),
col3 = runif(20),
col4 = runif(20)
)
W <- c(1,2,3,4)
ddply(df, .(id), function(x)as.data.frame(as.matrix(x[, -1]) %*% W))
结果:
id V1
1 1 4.924994
2 1 5.076043
3 2 7.053864
4 2 5.237132
5 2 6.307620
6 2 3.413056
7 2 5.182214
8 2 7.623164
9 3 5.194714
10 3 6.733229
11 4 4.122548
12 4 3.569013
13 4 4.978939
14 4 5.513444
15 4 5.840900
16 4 6.526522
17 5 3.530220
18 5 3.549646
19 5 4.340173
20 5 3.955517
这篇关于如何使用ddply计算任意列数的内积?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!