从数据框中的所有相关列中减去一行的值 [英] Subtract values of a single row from all relevant columns in a data frame

查看:139
本文介绍了从数据框中的所有相关列中减去一行的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据集:

foo=data.frame(index=rep(1:10,3),
               type=rep(c("A","B","C"),each=10),
               ping=rnorm(30),
               pong=runif(30))

我想减去 ping pong 表示 index == 5 type == B ,整列为 ping pong
可行:

I want to subtract the values of the columns ping and pong for index==5 and type=="B", to the whole columns ping and pong. This works:

vec=matrix(subset(foo,index==5 & type=="B",select=ping:pong),2,1)
foo[,c("ping","pong")]=foo[,c("ping","pong")]-vec

但是,很惊讶我不得不指定 vec 作为列向量,而不是行向量。我以为我需要对 foo 的所有(相似的子集)行减去相同的行向量。你能解释为什么吗?另外,如果可以使用更简单或更清晰的代码获得相同的结果,请告诉我。

However, I'm surprised that I had to specify vec as a column vector, instead than a row vector. I would have thought that I would need to subtract the same row vector to all (similar subsets of the) rows of foo. Can you explain why this is? Also, if the same result can be obtained with a simpler or cleaner code, please let me know.

推荐答案

您要执行以下操作:

myselect <- with(foo, index ==5 & type == "B")
mycol <- c('ping','pong')

foo[, mycol] <- foo[, mycol] - as.list(foo[myselect, mycol])

vec 应该是一个列表,因为列表的减法是逐元素进行的。这就是您想要的,实际上也是您正在做的:

vec should be a list, as substraction of lists is done element by element. That is what you want, and that is also what you're doing actually:

首先,您不指定 vec 作为矩阵。如果在列表上使用 matrix()而不是 as.matrix(),则会得到一个列表。由于数据框实际上是一个列表,因此 matrix()会为您提供带有属性 dim的列表。该属性使其看起来像矩阵,但是:

First of all, you don't specify vec as a matrix. if you use matrix() instead of as.matrix() on a list, you get a list. And as a data frame is essentially a list, matrix() gives you a list back with an attribute "dim". That attribute makes it look like a matrix, but:

> str(vec)
List of 2
 $ : num 0.704
 $ : num 0.164
 - attr(*, "dim")= int [1:2] 2 1

这里使用的是函数 matrix()<的副作用/ code>。它还会删除其他属性,因此会删除 vec data.frame 信息,并使其成为列表。如果 vec 仍然是一个数据框,它将无法正常工作。当两个数据帧的大小相同时,您只能使用数学运算符。事实并非如此。

What you use here, is a side effect of the function matrix(). It also drops other attributes, so it removes the data.frame information of vec and makes it a list. If vec would still be a data frame, it wouldn't work. You can only use a mathematical operator when both data frames have the same size. And this is not the case here.

> vec=subset(foo,index==5 & type=="B",select=ping:pong)
> foo[,c("ping","pong")]-vec
Error in Ops.data.frame(foo[, c("ping", "pong")], vec) : 
  ‘-’ only defined for equally-sized data frames

您也不应该将其制成矩阵。如果这样做,R将按矩阵回收矩阵和数据框。这意味着它从foo $ ping的第一个值减去vec的第一个值,从foo $ ping的第二个值减去vec的第二个值,再次从foo $ ping的第三个值减去vec的第一个值,依此类推。矩阵的放置方向无关紧要,结果总是相同的(错误!):

You also shouldn't make it a matrix. If you do, R will recycle your matrix and your dataframe column-wise. That means it substracts the first value of vec from the first of foo$ping, the second value of vec from the second of foo$ping, the first value of vec again from the third value of foo$ping and so forth. It doesn't matter in which direction you put the matrix, it's always the same (wrong!) result:

mytest<- matrix(c(-10,10), nrow = 1)
mytest2 <- t(mytest)
myfoo <- foo[,c('ping','pong')]
all.equal(myfoo - mytest, myfoo - mytest2)

这篇关于从数据框中的所有相关列中减去一行的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆