如何将行附加到 R 数据框 [英] How to append rows to an R data frame

查看:20
本文介绍了如何将行附加到 R 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我环顾了 StackOverflow,但找不到特定于我的问题的解决方案,这涉及将行附加到 R 数据框.

I have looked around StackOverflow, but I cannot find a solution specific to my problem, which involves appending rows to an R data frame.

我正在初始化一个空的 2 列数据框,如下所示.

I am initializing an empty 2-column data frame, as follows.

df = data.frame(x = numeric(), y = character())

然后,我的目标是遍历一个值列表,并在每次迭代中将一个值附加到列表的末尾.我从以下代码开始.

Then, my goal is to iterate through a list of values and, in each iteration, append a value to the end of the list. I started with the following code.

for (i in 1:10) {
    df$x = rbind(df$x, i)
    df$y = rbind(df$y, toString(i))
}

我也尝试了 cappendmerge 函数,但没有成功.如果您有任何建议,请告诉我.

I also attempted the functions c, append, and merge without success. Please let me know if you have any suggestions.

评论更新:我不认为 R 是如何使用的,但我想忽略在每次迭代时更新索引所需的额外代码行,而且我无法轻松预分配数据框的大小,因为我不知道不知道最终需要多少行.请记住,以上只是一个可重复的玩具示例.不管怎样,谢谢你的建议!

Update from comment: I don't presume to know how R was meant to be used, but I wanted to ignore the additional line of code that would be required to update the indices on every iteration and I cannot easily preallocate the size of the data frame because I don't know how many rows it will ultimately take. Remember that the above is merely a toy example meant to be reproducible. Either way, thanks for your suggestion!

推荐答案

更新

不知道您要做什么,我再分享一个建议:为每一列预分配所需类型的向量,将值插入这些向量中,然后在最后创建您的 数据.frame.

继续使用 Julian 的 f3(一个预先分配的 data.frame)作为目前最快的选项,定义为:

Continuing with Julian's f3 (a preallocated data.frame) as the fastest option so far, defined as:

# pre-allocate space
f3 <- function(n){
  df <- data.frame(x = numeric(n), y = character(n), stringsAsFactors = FALSE)
  for(i in 1:n){
    df$x[i] <- i
    df$y[i] <- toString(i)
  }
  df
}

这是一种类似的方法,但在最后一步创建 data.frame.

Here's a similar approach, but one where the data.frame is created as the last step.

# Use preallocated vectors
f4 <- function(n) {
  x <- numeric(n)
  y <- character(n)
  for (i in 1:n) {
    x[i] <- i
    y[i] <- i
  }
  data.frame(x, y, stringsAsFactors=FALSE)
}

microbenchmark"包中的

microbenchmark 将比 system.time 给我们更全面的洞察:

microbenchmark from the "microbenchmark" package will give us more comprehensive insight than system.time:

library(microbenchmark)
microbenchmark(f1(1000), f3(1000), f4(1000), times = 5)
# Unit: milliseconds
#      expr         min          lq      median         uq         max neval
#  f1(1000) 1024.539618 1029.693877 1045.972666 1055.25931 1112.769176     5
#  f3(1000)  149.417636  150.529011  150.827393  151.02230  160.637845     5
#  f4(1000)    7.872647    7.892395    7.901151    7.95077    8.049581     5

f1()(下面的方法)效率极低,因为它调用 data.frame 的频率很高,而且在 R 中以这种方式增长对象通常很慢.<由于预分配,code>f3() 得到了很大改进,但是 data.frame 结构本身可能是这里的瓶颈的一部分.f4() 尝试绕过该瓶颈而不影响您想要采用的方法.

f1() (the approach below) is incredibly inefficient because of how often it calls data.frame and because growing objects that way is generally slow in R. f3() is much improved due to preallocation, but the data.frame structure itself might be part of the bottleneck here. f4() tries to bypass that bottleneck without compromising the approach you want to take.

这真的不是一个好主意,但如果你想这样做,我想你可以试试:

This is really not a good idea, but if you wanted to do it this way, I guess you can try:

for (i in 1:10) {
  df <- rbind(df, data.frame(x = i, y = toString(i)))
}

请注意,在您的代码中,还有一个问题:

Note that in your code, there is one other problem:

  • 如果您希望字符不被转换为因子,您应该使用 stringsAsFactors.使用:df = data.frame(x = numeric(), y = character(), stringsAsFactors = FALSE)

这篇关于如何将行附加到 R 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆