在数据框上使用Apply时如何避免隐式字符转换 [英] How to avoid implicit character conversion when using apply on dataframe

查看:79
本文介绍了在数据框上使用Apply时如何避免隐式字符转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在data.frame上使用apply时,参数将(隐式)转换为字符.一个例子:

When using apply on a data.frame, the arguments are (implicitly) converted to character. An example:

df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)

但是:

 apply(df, 1, function(y) class(y["t2"]))
 ## [1] "character" "character" "character" "character" "character" "character"
 ## [7] "character" "character" "character" "character"

有什么办法可以避免这种转换?还是我总是必须通过as.POSIXlt(y["t2"])转换回来?

Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])?

修改
我的df有2个时间戳(例如t2和t3)和其他一些字段(例如v1,v2).对于具有给定t2的每一行,我想找到k个(例如3个)t3最接近但低于t2(且具有相同v1)的行,并从这些行中返回关于v2的统计信息(例如平均值).我编写了一个函数f(t2,v1,df),只是想使用apply(df, 1, function(x) f(y["t2"], y["v1"], df)将其应用于所有行.有没有更好的方法可以在R中执行此类操作?

edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df). Is there any better way to do such things in R?

推荐答案

让我们将多个注释包装成一个解释.

Let's wrap up multiple comments into an explanation.

  1. 使用apply会将data.frame转换为matrix.这 表示将使用限制最少的类.至少 在这种情况下,限制性是字符.
  2. 您正在为applyMARGIN参数提供1.这适用 逐行排列,让您变得更糟,因为您确实在混音课 现在在一起.在这种情况下,您将使用为矩阵设计的apply 和向量上的data.frames.这不是完成这项工作的正确工具.
  3. 在这种情况下,我将使用lapplysapply,因为rmk指出要获取 单个t2列,如下所示:
  1. the use of apply converts a data.frame to a matrix. This means that the least restrictive class will be used. The least restrictive in this case is character.
  2. You're supplying 1 to apply's MARGIN argument. This applies by row and makes you even worse off as you're really mixing classes together now. In this scenario you're using apply designed for matrices and data.frames on a vector. This is not the right tool for the job.
  3. In ths case I'd use lapply or sapply as rmk points out to grab the classes of the single t2 column as seen below:

代码:

df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))

sapply(df[, "t2"], class)
lapply(df[, "t2"], class)

## [[1]]
## [1] "POSIXct" "POSIXt" 
## 
## [[2]]
## [1] "POSIXct" "POSIXt" 
## 
## [[3]]
## [1] "POSIXct" "POSIXt" 
## 
## .
## .
## . 
## 
## [[9]]
## [1] "POSIXct" "POSIXt" 
## 
## [[10]]
## [1] "POSIXct" "POSIXt" 

通常,选择适合工作的apply系列.我经常亲自使用lapplyfor循环对特定的列进行操作,或者使用索引([, ])设置我想要的列的子集,然后继续进行apply.对这个问题的答案实际上可以归结为确定要完成的任务,问问apply是最合适的工具,然后从那里开始.

In general you choose the apply family that fits the job. Often I personally use lapply or a for loop to act on specific columns or subset the columns I want using indexing ([, ]) and then proceed with apply. The answer to this problem really boils down to determining what you want to accomplish, asking is apply the most appropriate tool, and proceed from there.

我可以提供此博客作为出色的教程,介绍了不同的apply系列功能的作用.

May I offer this blog post as an excellent tutorial on what the different apply family of functions do.

这篇关于在数据框上使用Apply时如何避免隐式字符转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆