使用循环创建多个命名数据框 [英] Making multiple named data frames with loop

查看:191
本文介绍了使用循环创建多个命名数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在学习的过程中。没有问我的第一个问题,所以我再次尝试,并尽我所能更清楚。我试图创建一系列的数据帧为我的大问题一个可重复的问题。我想制作4个数据帧,每个数据帧的年份都不一样。最后我会合并这四个数据框来解释我遇到的问题。



这是最近的解决方案。这个运行,而是在全局目录中创建一个没有任何框架的四个数据框的列表。

  datafrom<  -  list )
年< - c(2006,2008,2010,2012)

(i为1:长度(年)){
唯一标识符< - 1:10 #< - 不是所有的数字 - 保留为字符向量
名称< - 字母[seq(from = 1,to = 10)]
Entity_Type < - factor(This,That )
Data1←rorm(10)
Data2←rorm(10)
Data3←rorm(10)
Data4←rorm(10)$ $ data_frame(UniqueID,Name,Entity_Type,Data1,Data2,Data3,Data4,Year)
$年b $ b

我想要4个独立的数据框,每个数据框都命名为datafrom2006,datafrom2008等。
$ b

非常感谢您的耐心等待我的学习。

解决方案

这里很少(很多)的技术,我会打电话给他们(1)野蛮的(2)基于列表,和(3)单一的长形式data.frame。

我将在示例中添加一个函数想要应用到每个data.frame。尽管有人设想,但这有助于说明这一点:$ b​​
$ b pre $ $ code $ ##在b $ b年份中使用的一些常数< - c(2006年,2008,2010,2012)
n < - 10
myfunc < - function(x){
interestingPart < - x [,grepl('^ Data',colnames(x) )]
sapply(interestingPart,mean)
}



h2>

是的,你可以从一个循环创建多个like-named和same-structure data.frames,虽然它通常被许多经验丰富的( R
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ tmpdf< - data.frame(UniqueID = as.character(1:n),
Name = LETTERS [1:n],
Entity_Type = factor(c('this','that') ),
Data1 = rnorm(n),
Data2 = rnorm(n),
Data3 = rnorm(n),
Data4 = rnorm(n),
年=年)
分配(sprintf('datafrom%s',yr),tmpdf)
}
rm(yr,tmpdf)

ls()$数据从2006年开始数据从2006年开始数据从2010年开始数据从2012年开始数据从数据中获取数据来源: n = 2)
##唯一ID名称实体类型数据1数据2数据3数据4年份
## 1 1 A此1.3709584 1.3048697 -0.3066386 0.4554501 2006
## 2 2 B即-0.5646982 2.2866454 -1.7813084 0.7048373 2006

为了查看每个data.frame的结果,通常(虽然不总是)做这样的事情:

  myfunc(datafrom2006)
## Data1 Data2 Data3 Data4
## 0.5472968 -0.1634567 -0.1780795 -0.3639041
myfunc(datafrom2008)
## Data1 Data2 Data3 Data4
## -0.02021535 0.01839391 0.53907680 -0.21787537
myfunc(datafrom2010)
# #Data1 Data2 Data3 Data4
## 0.25110630 -0.08719458 0.22924781 -0.19857243
myfunc(datafrom2012)
## Data1 Data2 Data3 Data4
## -0.7949660 0.2102418 -0.2022066 -0.2458678



基于列表



 <$ ($)
datafrom< - sapply(as.character(years),function(yr){
data.frame(UniqueID = as.character(1:n) ,
Name = LETTERS [1:n],
Entity_Type = factor(c('this','that')),
Data1 = rnorm(n),
Data2 = rnorm(n),
Data3 = rnorm(n),
Data4 = rnorm(n),
Year =年)
},simplify = FALSE)
str(datafrom)
## 4
## $ 2006的列表:'data.frame':10 obs。 8个变量:
## .. $ UniqueID:因子w / 10等级1,10,2,3,..:1 3 4 5 6 7 8 9 10 2
## .. $名称:具有10个等级的因子A,B,C,D,..:1 2 3 4 5 6 7 8 9 10
## .. $ Entity_Type:因子w / 2级别that,this:2 1 2 1 2 1 2 1 2 1
## .. $ Data1:num [1:10] 1.371 -0.565 0.363 0.633 0.404 ...
## .. $ Data2:num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
## .. $ Data3:num [1:10] - 0.307 -1.781 -0.172 1.215 1.895 ...
## .. $ Data4:num [1:10] 0.455 0.705 1.035 -0.609 0.505 ...
## .. $年份:因子w / 1级2006:1 1 1 1 1 1 1 1 1 1
## $ 2008:'data.frame':10 obs。 8个变量:
## .. $ UniqueID:因子w / 10等级1,10,2,3,..:1 3 4 5 6 7 8 9 10 2
#### ... snip ...

头(datafrom [[1]],n = 2)
##唯一ID名称实体类型数据1数据2数据3数据4年
## 1 1 A this 1.3709584 1.3048697 -0.3066386 0.4554501 2006
## 2 2 B that -0.5646982 2.2866454 -1.7813084 0.7048373 2006

head(datafrom [['2008']] ,n = 2)
## UniqueID Name Entity_Type Data1 Data2 Data3 Data4 Year
## 1 1 This 0.2059986 0.32192527 -0.3672346 -1.04311894 2008
## 2 2 B that -0.3610573 -0.78383894 0.1852306 -0.09018639 2008

然而,您可以用一个测试函数性能:

  myfunc(datafrom [[1]])
myfunc(datafrom [['2010']])

然后在所有这些函数上运行非常简单

  lapply(datafrom,myfunc)
## $`2006`
## Data1 Data2 Data3 Data4
## 0.5472968 -0.1634567 -0.1780795 -0.3639041
## $`2008`
## Data1 Data2 Data3 Data4
## -0.02021535 0.01839391 0.53907680 -0.21787537
## $`2010`
## Data1 Data2 Data3 Data4
## 0.25110630 -0.08719458 0.22924781 -0.19857243
## $`2012`
## Data1 Data2 Data3 Data4
## -0.7949660 0.2102418 -0.2022066 -0.2458678



长格式数据



如果您将所有数据保留在同一个data.frame中,使用您已定义的 Year 列,仍然可以将其细分为单独的年份:

  longdf<  -  do.call('rbind.data.frame',datafrom)
rownames(longdf)< - NULL
longdf [c(1,11,21,31),]
## UniqueID Name Entity_Type Data1 Dat a2 Data3 Data4 Year
## 1 1 A This 1.3709584 1.3048697 -0.3066386 0.45545012 2006
## 11 1 This 0.2059986 0.3219253 -0.3672346 -1.04311894 2008
## 21 1 A 1.5127070 1.3921164 1.2009654 -0.02509255 2010
## 31 1 A this -1.4936251 0.5676206 -0.0861073 -0.04069848 2012

简单子集:


  • subset(longdf,Year == 2006)有它的货物和其他。
  • by(longdf,longdf $ Year,myfunc)

  • 如果使用 library(dplyr),请尝试 longdf%>%filter(Year == 2010)%>%myfunc()

    (注意:当试图绘制汇总数据时,数据处于这种形式时通常更容易,尤其是当使用 ggplot2 -like分层和审美。)



    对暴力的理由



    在答案t o你的评论问题,当用相同的结构做出不同的变量的时候,很容易推断你将对每个人做同样的事情,反过来或者立即连续地做。在一般的编程原则中,许多人试图概括他们所做的事情,以便如果能够完成一次,就可以在没有(严重)调整代码的情况下执行任意次数的操作。例如,比较上面两个例子中应用 myfunc 所需的内容。 另外,如果您以后想要将调用的结果聚合到 myfunc 中,在强力示例中(因为您必须捕获每个返回并手动组合),而其他两个技术可以使用简单的汇总函数(例如,另一个 lapply ,或者 Reduce 或 Filter )。


    In the process of learning. Didn't ask my first question well, so I'm trying again and doing my best to be more clear.

    I'm trying to create a series of data frames for a reproducible question for my larger issue. I would like to make 4 data frames, each named differently by the year. Eventually I will merge these four data frames to explain where I am encountering my issue.

    Here is the most recent solution. This runs, but instead creates a list of four data frames without any frames in the global directory.

     datafrom <- list()
     years <- c(2006,2008,2010,2012)
    
     for (i in 1:length(years)) {
      UniqueID <- 1:10 # <- Not all numeric - Kept as character vector
      Name <- LETTERS[seq( from = 1, to = 10 )]
      Entity_Type <- factor("This","That")
      Data1 <- rnorm(10)     
      Data2 <- rnorm(10) 
      Data3 <- rnorm(10) 
      Data4 <- rnorm(10) 
      Year <- years[i]
      datafrom[[i]] <- data.frame(UniqueID, Name, Entity_Type, Data1, Data2, Data3, Data4, Year)
     }
    

    I would like 4 separate data frames, each named datafrom2006, datafrom2008, etc.

    Many thanks in advance for your patience with my learning.

    解决方案

    I'll demonstrate a few (of many) techniques here, and I'll call them (1) brute force, (2) list-based, and (3) single long-form data.frame.

    I'll add to the example the use of a function that you want to apply to each data.frame. Though contrived, it helps makes the point:

    ## some constants used throughout
    years <- c(2006, 2008, 2010, 2012)
    n <- 10
    myfunc <- function(x) {
        interestingPart <- x[ , grepl('^Data', colnames(x)) ]
        sapply(interestingPart, mean)
    }
    

    Brute Force

    Yes, you can create multiple like-named and same-structure data.frames from a loop, though it is typically frowned upon by many experienced (R?) programmers:

    set.seed(42)
    for (yr in years) {
        tmpdf <- data.frame(UniqueID=as.character(1:n),
                            Name=LETTERS[1:n],
                            Entity_Type=factor(c('this', 'that')),
                            Data1=rnorm(n),
                            Data2=rnorm(n),
                            Data3=rnorm(n),
                            Data4=rnorm(n),
                            Year=yr)
        assign(sprintf('datafrom%s', yr), tmpdf)
    }
    rm(yr, tmpdf)
    
    ls()
    ## [1] "datafrom2006" "datafrom2008" "datafrom2010" "datafrom2012" "myfunc"      
    ## [6] "n"            "years"       
    
    head(datafrom2006, n=2)
    ##   UniqueID Name Entity_Type      Data1      Data2      Data3      Data4 Year
    ## 1        1    A        this  1.3709584  1.3048697 -0.3066386  0.4554501 2006
    ## 2        2    B        that -0.5646982  2.2866454 -1.7813084  0.7048373 2006
    

    In order to see the results for each data.frame, one would typically (though not always) do something like this:

    myfunc(datafrom2006)
    ##      Data1      Data2      Data3      Data4 
    ##  0.5472968 -0.1634567 -0.1780795 -0.3639041 
    myfunc(datafrom2008)
    ##       Data1       Data2       Data3       Data4 
    ## -0.02021535  0.01839391  0.53907680 -0.21787537 
    myfunc(datafrom2010)
    ##       Data1       Data2       Data3       Data4 
    ##  0.25110630 -0.08719458  0.22924781 -0.19857243 
    myfunc(datafrom2012)
    ##      Data1      Data2      Data3      Data4 
    ## -0.7949660  0.2102418 -0.2022066 -0.2458678 
    

    List-Based

    set.seed(42)
    datafrom <- sapply(as.character(years), function(yr) {
                           data.frame(UniqueID=as.character(1:n),
                                      Name=LETTERS[1:n],
                                      Entity_Type=factor(c('this', 'that')),
                                      Data1=rnorm(n),
                                      Data2=rnorm(n),
                                      Data3=rnorm(n),
                                      Data4=rnorm(n),
                                      Year=yr)
                       }, simplify=FALSE)
    str(datafrom)
    ## List of 4
    ##  $ 2006:'data.frame':    10 obs. of  8 variables:
    ##   ..$ UniqueID   : Factor w/ 10 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2
    ##   ..$ Name       : Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
    ##   ..$ Entity_Type: Factor w/ 2 levels "that","this": 2 1 2 1 2 1 2 1 2 1
    ##   ..$ Data1      : num [1:10] 1.371 -0.565 0.363 0.633 0.404 ...
    ##   ..$ Data2      : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
    ##   ..$ Data3      : num [1:10] -0.307 -1.781 -0.172 1.215 1.895 ...
    ##   ..$ Data4      : num [1:10] 0.455 0.705 1.035 -0.609 0.505 ...
    ##   ..$ Year       : Factor w/ 1 level "2006": 1 1 1 1 1 1 1 1 1 1
    ##  $ 2008:'data.frame':    10 obs. of  8 variables:
    ##   ..$ UniqueID   : Factor w/ 10 levels "1","10","2","3",..: 1 3 4 5 6 7 8 9 10 2
    #### ...snip...
    
    head(datafrom[[1]], n=2)
    ##   UniqueID Name Entity_Type      Data1      Data2      Data3      Data4 Year
    ## 1        1    A        this  1.3709584  1.3048697 -0.3066386  0.4554501 2006
    ## 2        2    B        that -0.5646982  2.2866454 -1.7813084  0.7048373 2006
    
    head(datafrom[['2008']], n=2)
    ##   UniqueID Name Entity_Type      Data1       Data2      Data3       Data4 Year
    ## 1        1    A        this  0.2059986  0.32192527 -0.3672346 -1.04311894 2008
    ## 2        2    B        that -0.3610573 -0.78383894  0.1852306 -0.09018639 2008
    

    However, with this you can test your function performance with just one:

    myfunc(datafrom[[1]])
    myfunc(datafrom[['2010']])
    

    and then run the function on all of them very simply:

    lapply(datafrom, myfunc)
    ## $`2006`
    ##      Data1      Data2      Data3      Data4 
    ##  0.5472968 -0.1634567 -0.1780795 -0.3639041 
    ## $`2008`
    ##       Data1       Data2       Data3       Data4 
    ## -0.02021535  0.01839391  0.53907680 -0.21787537 
    ## $`2010`
    ##       Data1       Data2       Data3       Data4 
    ##  0.25110630 -0.08719458  0.22924781 -0.19857243 
    ## $`2012`
    ##      Data1      Data2      Data3      Data4 
    ## -0.7949660  0.2102418 -0.2022066 -0.2458678 
    

    Long-form Data

    If instead you keep all of the data in the same data.frame, using your already-defined column of Year, you can still segment it for exploring individual years:

    longdf <- do.call('rbind.data.frame', datafrom)
    rownames(longdf) <- NULL
    longdf[c(1,11,21,31),]
    ##    UniqueID Name Entity_Type      Data1     Data2      Data3       Data4 Year
    ## 1         1    A        this  1.3709584 1.3048697 -0.3066386  0.45545012 2006
    ## 11        1    A        this  0.2059986 0.3219253 -0.3672346 -1.04311894 2008
    ## 21        1    A        this  1.5127070 1.3921164  1.2009654 -0.02509255 2010
    ## 31        1    A        this -1.4936251 0.5676206 -0.0861073 -0.04069848 2012
    

    Simple subsets:

    • subset(longdf, Year == 2006), though subset has its goods and others.
    • by(longdf, longdf$Year, myfunc)
    • If using library(dplyr), try longdf %>% filter(Year == 2010) %>% myfunc()

    (Side note: when trying to plot aggregate data, it's often easier when the data is in this form, especially when using ggplot2-like layering and aesthetics.)

    Rationale Against "Brute Force"

    In answer to your comment question, when making different variables with the same structure, it is easy to deduce that you will be doing the same thing to each of them, in turn or immediately-consecutively. In general programming principle, many try to generalize what they do so that it if it can be done once, it can be done an arbitrary number of times without (heavily) adjusting the code. For instance, compare what was necessary in applying myfunc in the two examples above.

    Further, if you later want to aggregate the results from your calls to myfunc, it is more laborious in the "brute force" example (as you must capture each return and combine manually), whereas the other two techniques can use simpler summarizing functions (e.g., another lapply, or perhaps Reduce or Filter).

    这篇关于使用循环创建多个命名数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆