在数据框上使用Apply时如何避免隐式字符转换 [英] How to avoid implicit character conversion when using apply on dataframe
问题描述
在data.frame上使用apply
时,参数将(隐式)转换为字符.一个例子:
When using apply
on a data.frame, the arguments are (implicitly) converted to character. An example:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)
但是:
apply(df, 1, function(y) class(y["t2"]))
## [1] "character" "character" "character" "character" "character" "character"
## [7] "character" "character" "character" "character"
有什么办法可以避免这种转换?还是我总是必须通过as.POSIXlt(y["t2"])
转换回来?
Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])
?
修改
我的df有2个时间戳(例如t2和t3)和其他一些字段(例如v1,v2).对于具有给定t2的每一行,我想找到k个(例如3个)t3最接近但低于t2(且具有相同v1)的行,并从这些行中返回关于v2的统计信息(例如平均值).我编写了一个函数f(t2,v1,df),只是想使用apply(df, 1, function(x) f(y["t2"], y["v1"], df)
将其应用于所有行.有没有更好的方法可以在R中执行此类操作?
edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df)
. Is there any better way to do such things in R?
推荐答案
让我们将多个注释包装成一个解释.
Let's wrap up multiple comments into an explanation.
- 使用
apply
会将data.frame
转换为matrix
.这 表示将使用限制最少的类.至少 在这种情况下,限制性是字符. - 您正在为
apply
的MARGIN
参数提供1
.这适用 逐行排列,让您变得更糟,因为您确实在混音课 现在在一起.在这种情况下,您将使用为矩阵设计的apply
和向量上的data.frames.这不是完成这项工作的正确工具. - 在这种情况下,我将使用
lapply
或sapply
,因为rmk指出要获取 单个t2列,如下所示:
- the use of
apply
converts adata.frame
to amatrix
. This means that the least restrictive class will be used. The least restrictive in this case is character. - You're supplying
1
toapply
'sMARGIN
argument. This applies by row and makes you even worse off as you're really mixing classes together now. In this scenario you're usingapply
designed for matrices and data.frames on a vector. This is not the right tool for the job. - In ths case I'd use
lapply
orsapply
as rmk points out to grab the classes of the single t2 column as seen below:
代码:
df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
sapply(df[, "t2"], class)
lapply(df[, "t2"], class)
## [[1]]
## [1] "POSIXct" "POSIXt"
##
## [[2]]
## [1] "POSIXct" "POSIXt"
##
## [[3]]
## [1] "POSIXct" "POSIXt"
##
## .
## .
## .
##
## [[9]]
## [1] "POSIXct" "POSIXt"
##
## [[10]]
## [1] "POSIXct" "POSIXt"
通常,选择适合工作的apply
系列.我经常亲自使用lapply
或for
循环对特定的列进行操作,或者使用索引([, ]
)设置我想要的列的子集,然后继续进行apply
.对这个问题的答案实际上可以归结为确定要完成的任务,问问apply
是最合适的工具,然后从那里开始.
In general you choose the apply
family that fits the job. Often I personally use lapply
or a for
loop to act on specific columns or subset the columns I want using indexing ([, ]
) and then proceed with apply
. The answer to this problem really boils down to determining what you want to accomplish, asking is apply
the most appropriate tool, and proceed from there.
我可以提供此博客作为出色的教程,介绍了不同的apply
系列功能的作用.
May I offer this blog post as an excellent tutorial on what the different apply
family of functions do.
这篇关于在数据框上使用Apply时如何避免隐式字符转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!