在R中,如何快速循环数据帧的行? [英] In R, how do you loop over the rows of a data frame really fast?

查看:613
本文介绍了在R中,如何快速循环数据帧的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设你有一个数据框,有很多行和很多列。

Suppose that you have a data frame with many rows and many columns.

列有名称。您想要按数字访问行,按名称访问列。

The columns have names. You want to access rows by number, and columns by name.

例如,循环遍历行的一种(可能是缓慢的)方式是

For example, one (possibly slow) way to loop over the rows is

for (i in 1:nrow(df)) {
  print(df[i, "column1"])
  # do more things with the data frame...
}

另一种方法是创建列表对于单独的列(例如 column1_list = df [[column1] ),并在一个循环中访问列表。这种方法可能很快,但是如果您想要访问许多列,也是不方便的。

Another way is to create "lists" for separate columns (like column1_list = df[["column1"]), and access the lists in one loop. This approach might be fast, but also inconvenient if you want to access many columns.

是否有快速的循环数据帧行?是否有其他数据结构更好地循环快速?

Is there a fast way of looping over the rows of a data frame? Is some other data structure better for looping fast?

推荐答案

我想我需要做一个完整的答案,因为我更难发现评论跟踪,我已经失去了一个评论...有一个例子, nullglob 展示了其他例子之间的差异,并应用家庭功能。当一个功能使得它非常慢,那就是所有的速度消耗的地方,你不会发现循环变化之间的差异。但是,当您使功能变得微不足道时,您可以看到循环影响的是多少。

I think I need to make this a full answer because I find comments harder to track and I already lost one comment on this... There is an example by nullglob that demonstrates the differences among for, and apply family functions much better than other examples. When one makes the function such that it is very slow then that's where all the speed is consumed and you won't find differences among the variations on looping. But when you make the function trivial then you can see how much the looping influences things.

我还要补充说,其他一些成员在其他方面未被探索示例具有有趣的性能属性。首先,我将在我的机器上显示nullglob相对结果的复制。

I'd also like to add that some members of the apply family unexplored in other examples have interesting performance properties. First I'll show replications of nullglob's relative results on my machine.

n <- 1e6
system.time(for(i in 1:n) sinI[i] <- sin(i))
  user  system elapsed 
 5.721   0.028   5.712 

lapply runs much faster for the same result
system.time(sinI <- lapply(1:n,sin))
   user  system elapsed 
  1.353   0.012   1.361 

他也发现要慢得多。这里有一些没有测试。

He also found sapply much slower. Here are some others that weren't tested.

旧版应用于数据的矩阵版本...

Plain old apply to a matrix version of the data...

mat <- matrix(1:n,ncol =1),1,sin)
system.time(sinI <- apply(mat,1,sin))
   user  system elapsed 
  8.478   0.116   8.531 

因此,apply()命令本身比for循环慢得多。 (如果我使用sin(mat [i,1]),循环不会明显减慢。

So, the apply() command itself is substantially slower than the for loop. (for loop is not slowed down appreciably if I use sin(mat[i,1]).

另一个似乎没有在其他帖子中测试的是

Another one that doesn't seem to be tested in other posts is tapply.

system.time(sinI <- tapply(1:n, 1:n, sin))
   user  system elapsed 
 12.908   0.266  13.589 

当然,一个人永远不会用这种方式在大多数情况下,它的效用远远超过任何这样的速度问题。

Of course, one would never use tapply this way and it's utility is far beyond any such speed problem in most cases.

这篇关于在R中,如何快速循环数据帧的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆