R ddply,应用if和ifelse函数 [英] R ddply, applying if and ifelse functions
问题描述
我正在尝试使用plyr包中的ddply将函数应用于数据框,但是得到了一些我不理解的结果.我有3个关于 结果
I'm trying to apply a function to a dataframe using ddply from the plyr package, but I'm getting some results that I don't understand. I have 3 questions about the results
给出:
mydf<- data.frame(c(12,34,9,3,22,55),c(1,2,1,1,2,2)
, c(0,1,2,1,1,2))
colnames(mydf)[1] <- 'n'
colnames(mydf)[2] <- 'x'
colnames(mydf)[3] <- 'x1'
mydf看起来像这样:
mydf looks like this:
n x x1
1 12 1 0
2 34 2 1
3 9 1 2
4 3 1 1
5 22 2 1
6 55 2 2
问题#1
如果我这样做:
Question #1
If I do:
k <- function(x) {
mydf$z <- ifelse(x == 1, 0, mydf$n)
return (mydf)
}
mydf <- ddply(mydf, c("x") , .fun = k, .inform = TRUE)
我收到以下错误:
Error in `$<-.data.frame`(`*tmp*`, "z", value = structure(c(12, 34, 9, :
replacement has 3 rows, data has 6
Error: with piece 1:
n x x1
1 12 1 0
2 9 1 2
3 3 1 1
无论是否将变量指定为c("x"),"x"或.(x),我都会收到此错误.我不明白为什么会收到此错误消息.
I get this error regardless of whether I specify the variable to split by as c("x"), "x", or .(x). I don't understand why I'm getting this error message.
但是,我真正想做的是设置一个if/else函数,因为我的数据集具有变量x1,x2,x3和x4,并且我也想将这些变量也考虑在内.但是当我尝试一些简单的事情时,例如:
But, what I really want to do is set up an if/else function because my dataset has variables x1, x2, x3, and x4 and I want to take those variables into account as well. But when I try something simple such as:
j <- function(x) {
if(x == 1){
mydf$z <- 0
} else {
mydf$z <- mydf$n
}
return(mydf)
}
mydf <- ddply(mydf, x, .fun = j, .inform = TRUE)
我得到:
Warning messages:
1: In if (x == 1) { :
the condition has length > 1 and only the first element will be used
2: In if (x == 1) { :
the condition has length > 1 and only the first element will be used
问题#3
我对使用function()和何时使用function(x)感到困惑.对j()或k()使用function()会给我一个不同的错误:
Question #3
I'm confused about to use function() and when to use function(x). Using function() for either j() or k() gives me a different error:
Error in .fun(piece, ...) : unused argument (piece)
Error: with piece 1:
n x x1 z
1 12 1 0 12
2 9 1 2 9
3 3 1 1 3
4 12 1 0 12
5 9 1 2 9
6 3 1 1 3
7 12 1 0 12
8 9 1 2 9
9 3 1 1 3
10 12 1 0 12
11 9 1 2 9
12 3 1 1 3
其中z列不正确.但是我看到很多函数都写为function().
where column z is not correct. Yet I see a lot of functions written as function().
我衷心感谢任何可以帮助我解决这个问题的评论
I sincerely appreciate any comments that can help me out with this
推荐答案
这里有很多需要解释的地方.让我们从最简单的情况开始.在第一个示例中,您需要做的是:
There's a lot that needs explaining here. Let's start with the simplest case. In your first example, all you need is:
mydf$z <- with(mydf,ifelse(x == 1,0,n))
等效的ddply
解决方案可能如下所示:
An equivalent ddply
solution might look like this:
ddply(mydf,.(x),transform,z = ifelse(x == 1,0,n))
最大的困惑可能是您似乎不了解ddply
中作为参数传递给函数的内容.
Probably your biggest source of confusion is that you seem to not understand what is being passed as arguments to functions within ddply
.
考虑您的第一次尝试:
k <- function(x) {
mydf$z <- ifelse(x == 1, 0, mydf$n)
return (mydf)
}
ddply
的工作方式是根据x
列中的值将mydf
拆分为几个较小的数据帧.这意味着每次ddply
调用k
时,传递给k
的参数都是数据帧.具体来说,您的主要数据帧的一个子集.
The way ddply
works is that it splits mydf
up into several, smaller data frame, based on the values in the column x
. That means that each time ddply
calls k
, the argument passed to k
is a data frame. Specifically, a subset of you primary data frame.
因此在k
中,x
是mydf
的子集,具有所有列.您不应该尝试从k
内部修改mydf
.修改x
,然后返回修改后的版本. (如果必须的话,但我上面显示的选项更好.)因此,我们可能会像这样重新编写您的k
:
So within k
, x
is a subset of mydf
, with all the columns. You should not be trying to modify mydf
from within k
. Modify x
, and then return the modified version. (If you must, but the options I displayed above are better.) So we might re-write your k
like this:
k <- function(x) {
x$z <- ifelse(x$x == 1, 0, x$n)
return (x)
}
请注意,您已经使用x
作为k
的参数和作为我们其中一列的名称,创建了一些令人困惑的东西.
Note that you've created some confusing stuff by using x
as both an argument to k
and as the name of one of our columns.
这篇关于R ddply,应用if和ifelse函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!