按行数拆分数据框 [英] Split up a dataframe by number of rows
问题描述
我想将这个数据框分成较小的数据框,之后我将运行我想运行的函数,然后在最后重新组合数据框。
我没有使用分组变量来分割这个数据框。我只想把它拆分成行数。例如,我想将这个400'000行的表拆分成400个1千行数据帧。
我该怎么做?
设置自己的分组变量
d< - split(my_data_frame,rep(1:400,each = 1000))
您还应该考虑 plyr
包中的 ddply
函数,或 group_by()
函数从 dplyr
。
编辑,以便在Hadley的评论之后。
如果您不知道数据帧中有多少行,或者数据框架可能是您想要的块大小不等长的长度,您可以执行
chunk <$ 1000
n< - nrow(my_data_frame)
r< - rep(1:ceiling(n / chunk),each = chunk)[1:n]
d< - split(my_data_frame,r)
您还可以使用
r< - ggplot2 :: cut_width(1:n,chunk,boundary = 0)
对于未来的读者来说,基于<$ c $的方法c> dplyr 和 data.table
软件包在数据帧上进行群组操作可能会更快。
I have a dataframe made up of 400'000 rows and about 50 columns. As this dataframe is so large, it is too computationally taxing to work with. I would like to split this dataframe up into smaller ones, after which I will run the functions I would like to run, and then reassemble the dataframe at the end.
There is no grouping variable that I would like to use to split up this dataframe. I would just like to split it up by number of rows. For example, I would like to split this 400'000-row table into 400 1'000-row dataframes. How might I do this?
Make your own grouping variable.
d <- split(my_data_frame,rep(1:400,each=1000))
You should also consider the ddply
function from the plyr
package, or the group_by()
function from dplyr
.
edited for brevity, after Hadley's comments.
If you don't know how many rows are in the data frame, or if the data frame might be an unequal length of your desired chunk size, you can do
chunk <- 1000
n <- nrow(my_data_frame)
r <- rep(1:ceiling(n/chunk),each=chunk)[1:n]
d <- split(my_data_frame,r)
You could also use
r <- ggplot2::cut_width(1:n,chunk,boundary=0)
For future readers, methods based on the dplyr
and data.table
packages will probably be (much) faster for doing group-wise operations on data frames.
这篇关于按行数拆分数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!