使用 dplyr group_by 模拟 split():返回数据帧列表 [英] Emulate split() with dplyr group_by: return a list of data frames

查看:26
本文介绍了使用 dplyr group_by 模拟 split():返回数据帧列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型数据集,它在 R 中阻塞了 split().我可以使用 dplyr group_by(无论如何这是一种首选方式),但我无法将生成的 grouped_df 作为数据帧列表持久化,这是我连续处理步骤所需的格式(我需要强制转换为 SpatialDataFrames 和类似的).

I have a large dataset that chokes split() in R. I am able to use dplyr group_by (which is a preferred way anyway) but I am unable to persist the resulting grouped_df as a list of data frames, a format required by my consecutive processing steps (I need to coerce to SpatialDataFrames and similar).

考虑一个示例数据集:

df = as.data.frame(cbind(c("a","a","b","b","c"),c(1,2,3,4,5), c(2,3,4,2,2)))
listDf = split(df,df$V1)

返回

$a
   V1 V2 V3
 1  a  1  2
 2  a  2  3

$b
   V1 V2 V3
 3  b  3  4
 4  b  4  2

$c
   V1 V2 V3
 5  c  5  2

我想用 group_by(类似于 group_by(df,V1))来模拟这个,但这会返回一个,grouped_df.我知道 do 应该能够帮助我,但我不确定用法(另见 链接进行讨论.)

I would like to emulate this with group_by (something like group_by(df,V1)) but this returns one, grouped_df. I know that do should be able to help me, but I am unsure about usage (also see link for a discussion.)

请注意,split 通过用于建立该组的因素的名称来命名每个列表 - 这是一个所需的功能(最终,对于一种从 dfs 列表中提取这些名称的方法的奖励).

Note that split names each list by the name of the factor that has been used to establish this group - this is a desired function (ultimately, bonus kudos for a way to extract these names from the list of dfs).

推荐答案

group_split in dplyr:

Dplyr 已经实现了 group_split:https://dplyr.tidyverse.org/reference/group_split.html

Dplyr has implemented group_split: https://dplyr.tidyverse.org/reference/group_split.html

它按组拆分数据帧,返回数据帧列表.这些数据帧中的每一个都是由拆分变量的类别定义的原始数据帧的子集.

It splits a dataframe by a groups, returns a list of dataframes. Each of these dataframes are subsets of the original dataframes defined by categories of the splitting variable.

例如.用变量Species分割数据集iris,并计算每个子数据集的摘要:

For example. Split the dataset iris by the variable Species, and calculate summaries of each sub-dataset:

> iris %>% 
+     group_split(Species) %>% 
+     map(summary)
[[1]]
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width          Species  
 Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100   setosa    :50  
 1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200   versicolor: 0  
 Median :5.000   Median :3.400   Median :1.500   Median :0.200   virginica : 0  
 Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246                  
 3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300                  
 Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600                  

[[2]]
  Sepal.Length    Sepal.Width     Petal.Length   Petal.Width          Species  
 Min.   :4.900   Min.   :2.000   Min.   :3.00   Min.   :1.000   setosa    : 0  
 1st Qu.:5.600   1st Qu.:2.525   1st Qu.:4.00   1st Qu.:1.200   versicolor:50  
 Median :5.900   Median :2.800   Median :4.35   Median :1.300   virginica : 0  
 Mean   :5.936   Mean   :2.770   Mean   :4.26   Mean   :1.326                  
 3rd Qu.:6.300   3rd Qu.:3.000   3rd Qu.:4.60   3rd Qu.:1.500                  
 Max.   :7.000   Max.   :3.400   Max.   :5.10   Max.   :1.800                  

[[3]]
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width          Species  
 Min.   :4.900   Min.   :2.200   Min.   :4.500   Min.   :1.400   setosa    : 0  
 1st Qu.:6.225   1st Qu.:2.800   1st Qu.:5.100   1st Qu.:1.800   versicolor: 0  
 Median :6.500   Median :3.000   Median :5.550   Median :2.000   virginica :50  
 Mean   :6.588   Mean   :2.974   Mean   :5.552   Mean   :2.026                  
 3rd Qu.:6.900   3rd Qu.:3.175   3rd Qu.:5.875   3rd Qu.:2.300                  
 Max.   :7.900   Max.   :3.800   Max.   :6.900   Max.   :2.500     

它对于调试嵌套数据帧上的计算也非常有帮助,因为它是查看"嵌套数据帧内部"计算中发生的事情的快速方法.

It is also very helpful for debugging a calculations on nested dataframes, because it is an quick way to "see" what is going on "inside" the calculations on nested dataframes.

这篇关于使用 dplyr group_by 模拟 split():返回数据帧列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆