根据变量id对不同版本的数据集进行循环,并在每个循环后保存结果 [英] do a loop with different versions of a dataset based on a variable id and save the result after each loop

查看:128
本文介绍了根据变量id对不同版本的数据集进行循环,并在每个循环后保存结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  • 我有一个y国家的数据集y年。

  • 我想做一些分析(见下面所示,但这段代码不是问题)

  • 问题:我想对我已经拥有的代码进行这种分析,多次:每次使用另一个x国和y年的组合的不同数据集。要明确:我想对x国家和y年份的每个可能组合进行分析

  • I have a dataset with x countries over y years.
  • I would like to do a certain analysis (see indicated below, but this code is not the problem)
  • The problem: I would like to do this analysis of the code I already have, a number of times: each time with a different dataset that has another combination of the x countries and y years. To be clear: I would like to do the analysis for EACH possible combination of the x countries and the y years.

我想为每个版本的数据集执行的代码(说明数据集进一步了解)

The code that I would like to execute for each version of the dataset (explanation dataset see further)

library(stats)    
##### the analysis for one dataset ####
        d=data.frame(outcome_spring=rep(1,999),outcome_summer=rep(1,999),
                     outcome_autumn=rep(1,999),outcome_winter=rep(1,999))


    o <- lapply(1:999, function(i) { 


      Alldata_Rainfed<-subset(Alldata, rainfed <= i)

      outcome_spring=sum(Alldata$spring)
      outcome_summer=sum(Alldata$summer)
      outcome_autumn=sum(Alldata$autumn)
      outcome_winter=sum(Alldata$winter)


      d[i, ] = c(outcome_spring, outcome_summer, outcome_autumn, outcome_winter)


    } )

    combination<-as.data.frame(do.call(rbind, o)) #the output I want is another dataset for each unique dataset

    #### the end of the analysis for one dataset ####



期望的输出



这意味着作为输出I需要具有相同数量的数据集(在示例中称为组合)作为x国和y年之间可能的组合数。

Desired output

That means that as an output I need to have the same amounts of datasets (named "combination" in the example) as the number of combinations possible between x countries and y years.

> dput(Alldata)
structure(list(country = c("belgium", "belgium", "belgium", "belgium", 
"germany", "germany", "germany", "germany"), year = c(2004, 2005, 
2005, 2013, 2005, 2009, 2013, 2013), spring = c(23, 24, 45, 23, 
1, 34, 5, 23), summer = c(25, 43, 654, 565, 23, 1, 23, 435), 
    autumn = c(23, 12, 4, 12, 24, 64, 23, 12), winter = c(34, 
    45, 64, 13, 346, 74, 54, 45), irrigation = c(10, 30, 40, 
    300, 288, 500, 996, 235), id = c(1, 2, 2, 3, 4, 5, 6, 6)), datalabel = "", time.stamp = "14 Nov 2016 20:09", .Names = c("country", 
"year", "spring", "summer", "autumn", "winter", "irrigation", 
"id"), formats = c("%9s", "%9.0g", "%9.0g", "%9.0g", "%9.0g", 
"%9.0g", "%9.0g", "%9.0g"), types = c(7L, 254L, 254L, 254L, 254L, 
254L, 254L, 254L), val.labels = c("", "", "", "", "", "", "", 
""), var.labels = c("", "", "", "", "", "", "", "group(country year)"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8"), version = 12L, class = "data.frame")

在上面的例子中,我已经做了一个 id ,用于组合国家和年份。这意味着我想使用具有以下ids组合的所有观察结果的数据集:

In the example above, I already made an id for combining country and year. That means I want to make datasets with all observations that have combinations of the following ids:


  • 数据集1_2_3_4_5:ids 1,2,3, 4,5(所以这个数据集只能忽略id = 6的观察值)

  • 数据集1_2_3_4_6:ids 1,2,3,4,6(但不是5)

  • 数据集1_2:ids 1,2(但不是全部)

  • 数据集3_4_5:ids 3,4,5(但不是全部)

  • ....

  • dataset 1_2_3_4_5: ids 1, 2, 3, 4, 5 (so this dataset only misses the observations with id = 6)
  • dataset 1_2_3_4_6: ids 1, 2, 3, 4, 6 (but not 5)
  • dataset 1_2: ids 1, 2 (but not all the rest)
  • dataset 3_4_5: ids 3, 4, 5 (but not all the rest)
  • ....

etc etc ...请注意,我给出了数据集的名称包含的ID的名称。否则我很难区分所有不同的数据集。其他名称也很好,只要我能区分数据集!

etc etc... Note that I gave the name of the dataset the name of the ids that are included. Otherwise it will be hard for me to distinguish all the different datasets from each other. Other names are fine too, as long as I can distinguish between the datasets!

感谢您的帮助!

编辑:某些数据集可能不会产生任何结果(因为在第二个循环中使用循环也可以使用循环,某些组合可能没有灌溉),但是输出应该只是一个缺少值的数据集

it might be possible that certain datasets give no results (because in the second loop irrigation is used too loop and certain combinations might not have irrigation) but then the output should just be a dataset with missing values

推荐答案

不知道这是否是最有效的方法,但我认为它应该有效:

Not sure if this is the most efficient way of doing this, but I think it should work:

# create a df to store the results of all combinations
result=data.frame()

下一个循环基于combn()函数,它使用m个元素创建向量(这里的ID)的所有可能的组合。

The next loops are based on the combn() function, which creates all possible combinations of a vector (here ID), using m number of elements.

for(i in 2:max(o$id)){
  combis=combn(unique(o$id),i)
  for(j in 1:ncol(combis)){
    sub=o[o$id %in% combis[,j],]
    out=sub[1,]    # use your function
    out$label=paste(combis[,j],collapse ='') #provide an id so you know for which combination this result is
    result=rbind(result,out) # paste it to previous output
  }
}

这篇关于根据变量id对不同版本的数据集进行循环,并在每个循环后保存结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆