从数据框中的所有相关列中减去一行的值 [英] Subtract values of a single row from all relevant columns in a data frame
问题描述
我有以下数据集:
foo=data.frame(index=rep(1:10,3),
type=rep(c("A","B","C"),each=10),
ping=rnorm(30),
pong=runif(30))
我想减去 ping $ c列的值$ c>和
pong
表示 index == 5
和 type == B
,整列为 ping
和 pong
。
可行:
I want to subtract the values of the columns ping
and pong
for index==5
and type=="B"
, to the whole columns ping
and pong
.
This works:
vec=matrix(subset(foo,index==5 & type=="B",select=ping:pong),2,1)
foo[,c("ping","pong")]=foo[,c("ping","pong")]-vec
但是,很惊讶我不得不指定 vec
作为列向量,而不是行向量。我以为我需要对 foo
的所有(相似的子集)行减去相同的行向量。你能解释为什么吗?另外,如果可以使用更简单或更清晰的代码获得相同的结果,请告诉我。
However, I'm surprised that I had to specify vec
as a column vector, instead than a row vector. I would have thought that I would need to subtract the same row vector to all (similar subsets of the) rows of foo
. Can you explain why this is? Also, if the same result can be obtained with a simpler or cleaner code, please let me know.
推荐答案
您要执行以下操作:
myselect <- with(foo, index ==5 & type == "B")
mycol <- c('ping','pong')
foo[, mycol] <- foo[, mycol] - as.list(foo[myselect, mycol])
vec
应该是一个列表,因为列表的减法是逐元素进行的。这就是您想要的,实际上也是您正在做的:
vec
should be a list, as substraction of lists is done element by element. That is what you want, and that is also what you're doing actually:
首先,您不指定 vec
作为矩阵。如果在列表上使用 matrix()
而不是 as.matrix()
,则会得到一个列表。由于数据框实际上是一个列表,因此 matrix()
会为您提供带有属性 dim的列表。该属性使其看起来像矩阵,但是:
First of all, you don't specify vec
as a matrix. if you use matrix()
instead of as.matrix()
on a list, you get a list. And as a data frame is essentially a list, matrix()
gives you a list back with an attribute "dim". That attribute makes it look like a matrix, but:
> str(vec)
List of 2
$ : num 0.704
$ : num 0.164
- attr(*, "dim")= int [1:2] 2 1
这里使用的是函数 matrix()<的副作用/ code>。它还会删除其他属性,因此会删除
vec
的 data.frame
信息,并使其成为列表。如果 vec
仍然是一个数据框,它将无法正常工作。当两个数据帧的大小相同时,您只能使用数学运算符。事实并非如此。
What you use here, is a side effect of the function matrix()
. It also drops other attributes, so it removes the data.frame
information of vec
and makes it a list. If vec
would still be a data frame, it wouldn't work. You can only use a mathematical operator when both data frames have the same size. And this is not the case here.
> vec=subset(foo,index==5 & type=="B",select=ping:pong)
> foo[,c("ping","pong")]-vec
Error in Ops.data.frame(foo[, c("ping", "pong")], vec) :
‘-’ only defined for equally-sized data frames
您也不应该将其制成矩阵。如果这样做,R将按矩阵回收矩阵和数据框。这意味着它从foo $ ping的第一个值减去vec的第一个值,从foo $ ping的第二个值减去vec的第二个值,再次从foo $ ping的第三个值减去vec的第一个值,依此类推。矩阵的放置方向无关紧要,结果总是相同的(错误!):
You also shouldn't make it a matrix. If you do, R will recycle your matrix and your dataframe column-wise. That means it substracts the first value of vec from the first of foo$ping, the second value of vec from the second of foo$ping, the first value of vec again from the third value of foo$ping and so forth. It doesn't matter in which direction you put the matrix, it's always the same (wrong!) result:
mytest<- matrix(c(-10,10), nrow = 1)
mytest2 <- t(mytest)
myfoo <- foo[,c('ping','pong')]
all.equal(myfoo - mytest, myfoo - mytest2)
这篇关于从数据框中的所有相关列中减去一行的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!