多数据帧处理 [英] Multiple data frame handling

查看:69
本文介绍了多数据帧处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个数据框,它们的命名如下:
plant1_wd_hrly,plant2_wd_hrly,plant3_wd_hrly ......,
每个数据框都有这样的数据:

I have several data frames and they were named like this plant1_wd_hrly, plant2_wd_hrly,plant3_wd_hrly......, Each of them have data like this :

           time temp
   1 2012-01-01 00:00:00   20
   2 2012-01-01 01:00:00   21
   3 2012-01-01 02:00:00   22
   4 2012-01-01 03:00:00   23
   5 2012-01-01 04:00:00   24

我需要对所有这些对象进行每日汇总,并计算每日的最大值,最小值。
以下是生成此类df的代码:

I need to do a aggregation to the daily level with all of them and also calculate the daily max, min. Here is the code to generate such df:

      x=seq(
          from=as.POSIXct("2012-1-1 0:00", tz="UTC"),
          to=as.POSIXct("2012-1-3 23:00", tz="UTC"),
           by="hour")
      plant1_wd_hrly=data.frame("time"=x,"temp"=seq(20,length.out=length(x)))
      plant1_wd_hrly$time=as.POSIXct(substr(plant1_wd_hrly$time,1,10))
      plant2_wd_hrly=data.frame("time"=x,"temp"=seq(25,length.out=length(x)))
      plant2_wd_hrly$time=as.POSIXct(substr(plant1_wd_hrly$time,1,10))
      plant1_wd_hrly$temp[2:3]=NA
      plant2_wd_hrly$temp[5:6]=NA

如果只有一个df,我通常使用dplyr软件包进行聚合:

If it is only one df I usually do the aggregation using dplyr package:

      plant1_hrly=plant1_wd_hrly %>% group_by(time) %>% summarise(
                            temp_avg = mean(temp,na.rm=TRUE),
                            temp_max = max(temp,na.rm=TRUE),
                            temp_min = min(temp,na.rm=TRUE))

但是对于多个df,更有效的方式做到这一点?
我想做的第一件事是做一个for循环,是否可以从R加载一个动态生成的变量名,所以我可以遍历不同的df,因为它们的名称都非常相似?如果要为动态生成的变量名分配值,可以使用assign,但如何加载一个值?

But with multiple df, what is a more efficient way to do this? First thing I'm thinking is to do a for loop, could I load a dymanic generated variable name from R, so I could loop through the different df since they all have very similar names? If I want to assign a value to a dynamic generated variable name I could use assign, but how to load one?

谢谢。

推荐答案

使df名称成为矢量,例如:

Make a vector of df names like that, for instance:

df_names <- grep("plant", ls(), value = T)

如果没有其他变量名包含植物。否则,您需要使用正则表达式。

If no other variable names contain "plant". Otherwise you need to play with regex. Or pick them by hand.

然后只需在正文中使用get()和assign()遍历名称即可。
您将第一个命名为字符串,然后从变量获取值。第二个输入名称和值,并将值分配给名称。

Then just loop over the names using get() and assign() in the body. You give the first one the name as a string, and it get the value from the variable. The second takes a name and a value and assign the value to the name.

for(df_n in df_names){

temp_data = get(df_n) %>% group_by(time) %>% summarise(
                            temp_avg = mean(temp,na.rm=TRUE),
                            temp_max = max(temp,na.rm=TRUE),
                            temp_min = min(temp,na.rm=TRUE))

assign(paste0(df_n, "_agr"), temp_data)
}

这篇关于多数据帧处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆