在数据帧上使用应用时如何避免隐式字符转换 [英] How to avoid implicit character conversion when using apply on dataframe

查看:35
本文介绍了在数据帧上使用应用时如何避免隐式字符转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 data.frame 上使用 apply 时,参数会(隐式)转换为字符.一个例子:

When using apply on a data.frame, the arguments are (implicitly) converted to character. An example:

df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)

但是:

 apply(df, 1, function(y) class(y["t2"]))
 ## [1] "character" "character" "character" "character" "character" "character"
 ## [7] "character" "character" "character" "character"

有什么办法可以避免这种转换?还是我总是必须通过 as.POSIXlt(y["t2"]) 转换回来?

Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])?

编辑
我的 df 有 2 个时间戳(比如 t2 和 t3)和其他一些字段(比如 v1、v2).对于给定 t2 的每一行,我想找到 k(例如 3)行 t3 最接近但低于 t2(和相同的 v1),并从这些行返回 v2 的统计数据(例如平均值).我写了一个函数 f(t2, v1, df) 并且只想使用 apply(df, 1, function(x) f(y["t2"], y["v1"], df).在 R 中有没有更好的方法来做这样的事情?

edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df). Is there any better way to do such things in R?

推荐答案

让我们把多条评论总结成一个解释.

Let's wrap up multiple comments into an explanation.

  1. 使用applydata.frame 转换为matrix.这个意味着将使用限制最少的类.至少在这种情况下限制是字符.
  2. 您正在向 applyMARGIN 参数提供 1.这适用逐行让你更糟,因为你真的在混课现在在一起.在这种情况下,您使用的是专为矩阵设计的 apply和向量上的 data.frames.这不是完成这项工作的正确工具.
  3. 在这种情况下,我会使用 lapplysapply 作为 rmk 指出来获取单个 t2 列如下所示:
  1. the use of apply converts a data.frame to a matrix. This means that the least restrictive class will be used. The least restrictive in this case is character.
  2. You're supplying 1 to apply's MARGIN argument. This applies by row and makes you even worse off as you're really mixing classes together now. In this scenario you're using apply designed for matrices and data.frames on a vector. This is not the right tool for the job.
  3. In ths case I'd use lapply or sapply as rmk points out to grab the classes of the single t2 column as seen below:

代码:

df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))

sapply(df[, "t2"], class)
lapply(df[, "t2"], class)

## [[1]]
## [1] "POSIXct" "POSIXt" 
## 
## [[2]]
## [1] "POSIXct" "POSIXt" 
## 
## [[3]]
## [1] "POSIXct" "POSIXt" 
## 
## .
## .
## . 
## 
## [[9]]
## [1] "POSIXct" "POSIXt" 
## 
## [[10]]
## [1] "POSIXct" "POSIXt" 

通常,您选择适合工作的 apply 系列.我个人经常使用 lapplyfor 循环来处理特定列或使用索引([, ])对我想要的列进行子集,然后继续apply.这个问题的答案实际上归结为确定你想要完成什么,询问是apply最合适的工具,然后从那里开始.

In general you choose the apply family that fits the job. Often I personally use lapply or a for loop to act on specific columns or subset the columns I want using indexing ([, ]) and then proceed with apply. The answer to this problem really boils down to determining what you want to accomplish, asking is apply the most appropriate tool, and proceed from there.

我可以提供这个 博客发布,作为关于不同apply 函数系列的作用的优秀教程.

May I offer this blog post as an excellent tutorial on what the different apply family of functions do.

这篇关于在数据帧上使用应用时如何避免隐式字符转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆