如何在列表中保存数据帧 [英] How to save data frames in a list

查看:87
本文介绍了如何在列表中保存数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据框,代码不同的迭代。让它进行100次迭代。每次我将数据框写入 df ,我用它来存储即将到来的帧。



数据框是



第一次迭代

  V1 V2 V3 V4 
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2

第二次迭代

  V1 V2 V3 V4 
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2

第三次迭代

  V1 V2 V3 V4 
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2

等等



最后我想把所有的数据框都列在列表中,所以我可以处理列出其他操作。如何做到这一点?



这是一个示例代码

  data = list.files(pattern =。csv)
data1 = lapply(data,function(x)read.csv(x,header = TRUE))
files = length(data1)
for(i in 1:length(files))
{
...
code
...
}
df ##表示每次生成一些df


解决方案

从评论中,我明白你正在尝试通过一些算法的顺序迭代生成一个数据框架对象列表,每个算法都会生成一个新的数据框架。



假设我们有一些函数 f(),它可以从一些源生成一个新的data.frame,并且可能在返回之前上传data.frame。

  f<  -  function(){
#读取文件,做一些工作,生成数据框等
df #返回新的data.frame()
}

使用 append 或者类似于将新数据添加到列表中的东西是具有展开框架并将其合并的习惯。



相反,您的代码需要一个这样的结构:

  output_list<  -  list()#保存生成的框架的列表

while(more_work_to_do){
df< - f()#One迭代
output_list [[length(output_list)+1]]< - df
}

#此时,output_list是生成的数据帧
#的列表,其内部结构保留所有。

使用 [[]] 操作员为插入避免要更换的项目数量不是替换长度的倍数错误。 length(output_list)+1 构造简单地意味着一个超过数组的当前结尾,实际上是为你添加,而不需要维护一个单独的计数器。 / p>

这是一个例子

 > f< -function(){data.frame(x = rnorm(5),y = rnorm(5))} 
> output_list< - list()
> for(i in 1:5)output_list [[length(output_list)+1]]< - f()
> length(output_list)
[1] 5
> str(output_list)
列表5
$:'data.frame':5 obs。的2个变量:
.. $ x:num [1:5] -0.347 0.194 -0.406 -0.384 2.24
.. $ y:num [1:5] -0.756 0.3417 -0.7542 0.1612 -0.0494
$:'data.frame':5 obs。的2个变量:
.. $ x:num [1:5] 0.667 -0.186 0.602 -0.239 1.516
.. $ y:num [1:5] 0.263 -1.322 0.604 -0.135 -0.339
$:'data.frame':5 obs。的2个变量:
.. $ x:num [1:5] 1.064 -0.365 -1.584 0.163 0.142
.. $ y:num [1:5] -0.0782 1.3314 0.0797 -0.4096 0.4819
$:'data.frame':5 obs。的2个变量:
.. $ x:num [1:5] -2.0448 -0.4228 -0.5305 -0.0611 0.4114
.. $ y:num [1:5] -0.608 -0.74 -0.196 - 0.957 0.653
$:'data.frame':5 obs。的2个变量:
.. $ x:num [1:5] 0.582 -1.029 -1.222 1.755 0.259
.. $ y:num [1:5] 1.733 0.319 -0.597 -1.814 0.446
> output_list
[[1]]
xy
1 -0.3474823 -0.75595301
2 0.1941049 0.34170577
3 -0.4055180 -0.75424689
4 -0.3838479 0.16122522
5 2.2397387 -0.04936943

[[2]]
xy
1 0.6674517 0.2625242
2 -0.1859460 -1.3219586
3 0.6020241 0.6042548
4 -0.2387514 -0.1345904
5 1.5158875 -0.3392787

[[3]]
xy
1 1.0639814 -0.07823834
2 -0.3645768 1.33144410
3 -1.5839606 0.07973743
4 0.1630311 -0.40957609
5 0.1420562 0.48187377

[[4]]
xy
1 -2.04475082 -0.6083283
2 - 0.42280601 -0.7396052
3 -0.53048188 -0.1961052
4 -0.06107144 -0.9571272
5 0.41136718 0.6526753

[[5]]
xy
1 0.5821866 1.7325293
2 -1.0289847 0.3186825
3 -1.2218606 -0.5971967
4 1.7548963 -1.8136810
5 0.2592219 0.4463977

>


I have some data frames which comes on different iterations of my code. Let it be some 100 iterations. Each time i write the data frame to df which i use to store the upcoming frame.

The data frames are

first iteration

       V1          V2           V3          V4  
      5.1         3.5          1.4         0.2  
      4.9         3.0          1.4         0.2  
      4.7         3.2          1.3         0.2  
      4.6         3.1          1.5         0.2  
      5.0         3.6          1.4         0.2  

second iteration

          V1          V2           V3          V4  
          5.1         3.5          1.4         0.2  
          4.9         3.0          1.4         0.2  
          4.7         3.2          1.3         0.2  
          4.6         3.1          1.5         0.2  
          5.0         3.6          1.4         0.2 

third iteration

  V1          V2           V3          V4  
  5.1         3.5          1.4         0.2  
  4.9         3.0          1.4         0.2  
  4.7         3.2          1.3         0.2  
  4.6         3.1          1.5         0.2  
  5.0         3.6          1.4         0.2  

and so on

Now at the end I want to have all the data frames in a list so I can process the list for other operation. How do I do this?

Here is a sample code

data = list.files(pattern=".csv")
data1 = lapply(data, function(x) read.csv(x, header = TRUE))
files = length(data1)
for(i in 1:length(files))
{
  ......
  code
  ......
}
 df   ## say some df is generated each time 

解决方案

From the comments, I understand you are trying to generate a list of data.frame objects over sequential iterations of some algorithm - each of which produces a new data.frame.

Suppose we have some function f() which generates a new data.frame, from some source, and perhaps uploads the data.frame before returning it.

f <- function() {
    # read a file, do some work, produce a dataframe, etc
    df # return the new data.frame()
}

The problem with using append or something similar to add the new data.frame to the list is that is has a habit of "unrolling" the frame and merging it in.

Instead, your code needs a structure like this:

output_list <- list() # A list to hold the generated frames

while (more_work_to_do) {
    df <- f() #One iteration
    output_list[[length(output_list)+1]] <- df
}

# At this point, output_list is a list of the generated data frames
# with all their internal structure preserved.

It's important to use the [[]] operator for the insert to avoid the " number of items to replace is not a multiple of replacement length" error. The length(output_list)+1 construct simply means "one past the current end of the array" and in effect does an append for you without needing to maintain a separate counter.

Here's an example

> f<-function() { data.frame(x=rnorm(5), y=rnorm(5)) }
> output_list <- list()
> for (i in 1:5) output_list[[length(output_list)+1]] <- f()
> length(output_list)
[1] 5
> str(output_list)
List of 5
 $ :'data.frame':   5 obs. of  2 variables:
  ..$ x: num [1:5] -0.347 0.194 -0.406 -0.384 2.24
  ..$ y: num [1:5] -0.756 0.3417 -0.7542 0.1612 -0.0494
 $ :'data.frame':   5 obs. of  2 variables:
  ..$ x: num [1:5] 0.667 -0.186 0.602 -0.239 1.516
  ..$ y: num [1:5] 0.263 -1.322 0.604 -0.135 -0.339
 $ :'data.frame':   5 obs. of  2 variables:
  ..$ x: num [1:5] 1.064 -0.365 -1.584 0.163 0.142
  ..$ y: num [1:5] -0.0782 1.3314 0.0797 -0.4096 0.4819
 $ :'data.frame':   5 obs. of  2 variables:
  ..$ x: num [1:5] -2.0448 -0.4228 -0.5305 -0.0611 0.4114
  ..$ y: num [1:5] -0.608 -0.74 -0.196 -0.957 0.653
 $ :'data.frame':   5 obs. of  2 variables:
  ..$ x: num [1:5] 0.582 -1.029 -1.222 1.755 0.259
  ..$ y: num [1:5] 1.733 0.319 -0.597 -1.814 0.446
> output_list
[[1]]
           x           y
1 -0.3474823 -0.75595301
2  0.1941049  0.34170577
3 -0.4055180 -0.75424689
4 -0.3838479  0.16122522
5  2.2397387 -0.04936943

[[2]]
           x          y
1  0.6674517  0.2625242
2 -0.1859460 -1.3219586
3  0.6020241  0.6042548
4 -0.2387514 -0.1345904
5  1.5158875 -0.3392787

[[3]]
           x           y
1  1.0639814 -0.07823834
2 -0.3645768  1.33144410
3 -1.5839606  0.07973743
4  0.1630311 -0.40957609
5  0.1420562  0.48187377

[[4]]
            x          y
1 -2.04475082 -0.6083283
2 -0.42280601 -0.7396052
3 -0.53048188 -0.1961052
4 -0.06107144 -0.9571272
5  0.41136718  0.6526753

[[5]]
           x          y
1  0.5821866  1.7325293
2 -1.0289847  0.3186825
3 -1.2218606 -0.5971967
4  1.7548963 -1.8136810
5  0.2592219  0.4463977

> 

这篇关于如何在列表中保存数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆