使用Pandas在Python中平均来自多个数据文件的数据 [英] Averaging data from multiple data files in Python with pandas

查看:288
本文介绍了使用Pandas在Python中平均来自多个数据文件的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我运行的实验有30次重复运行,因此有30个CSV数据文件.我正在使用pandas的read_csv()函数将数据读取到DataFrames列表中.我想从此列表中创建一个DataFrame,其中包含每列30个DataFrame的平均值.有内置的方法可以做到这一点吗?

I have 30 csv data files from 30 replicate runs of an experiment I ran. I am using pandas' read_csv() function to read the data into a list of DataFrames. I would like to create a single DataFrame out of this list, containing the average of the 30 DataFrames for each column. Is there a built-in way to accomplish this?

为澄清起见,我将在下面的答案中扩展该示例.假设我有两个DataFrame:

To clarify, I'll expand on the example in the answers below. Say I have two DataFrames:

>>> x
          A         B         C
0 -0.264438 -1.026059 -0.619500
1  0.927272  0.302904 -0.032399
2 -0.264273 -0.386314 -0.217601
3 -0.871858 -0.348382  1.100491
>>> y
          A         B         C
0  1.923135  0.135355 -0.285491
1 -0.208940  0.642432 -0.764902
2  1.477419 -1.659804 -0.431375
3 -1.191664  0.152576  0.935773

我应该使用什么合并功能来与DataFrame进行排序的3D数组?例如,

What is the merging function I should use to make a 3D array of sorts with the DataFrame? e.g.,

>>> automagic_merge(x, y)
                      A                      B                      C
0 [-0.264438,  1.923135] [-1.026059,  0.135355] [-0.619500, -0.285491]
1 [ 0.927272, -0.208940] [ 0.302904,  0.642432] [-0.032399, -0.764902]
2 [-0.264273,  1.477419] [-0.386314, -1.659804] [-0.217601, -0.431375]
3 [-0.871858, -1.191664] [-0.348382,  0.152576] [ 1.100491,  0.935773]

所以我可以计算那些列表而不是整个列的平均值,s.e.m.等.

so I can calculate average, s.e.m., etc. on those lists instead of the entire column.

推荐答案

签出:

In [14]: glued = pd.concat([x, y], axis=1, keys=['x', 'y'])

In [15]: glued
Out[15]: 
          x                             y                    
          A         B         C         A         B         C
0 -0.264438 -1.026059 -0.619500  1.923135  0.135355 -0.285491
1  0.927272  0.302904 -0.032399 -0.208940  0.642432 -0.764902
2 -0.264273 -0.386314 -0.217601  1.477419 -1.659804 -0.431375
3 -0.871858 -0.348382  1.100491 -1.191664  0.152576  0.935773

In [16]: glued.swaplevel(0, 1, axis=1).sortlevel(axis=1)
Out[16]: 
          A                   B                   C          
          x         y         x         y         x         y
0 -0.264438  1.923135 -1.026059  0.135355 -0.619500 -0.285491
1  0.927272 -0.208940  0.302904  0.642432 -0.032399 -0.764902
2 -0.264273  1.477419 -0.386314 -1.659804 -0.217601 -0.431375
3 -0.871858 -1.191664 -0.348382  0.152576  1.100491  0.935773

In [17]: glued = glued.swaplevel(0, 1, axis=1).sortlevel(axis=1)

In [18]: glued
Out[18]: 
          A                   B                   C          
          x         y         x         y         x         y
0 -0.264438  1.923135 -1.026059  0.135355 -0.619500 -0.285491
1  0.927272 -0.208940  0.302904  0.642432 -0.032399 -0.764902
2 -0.264273  1.477419 -0.386314 -1.659804 -0.217601 -0.431375
3 -0.871858 -1.191664 -0.348382  0.152576  1.100491  0.935773

出于记录目的,仅出于视觉目的,无需交换级别和重新排序.

For the record, swapping the level and reordering was not necessary, just for visual purposes.

然后您可以执行以下操作:

Then you can do stuff like:

In [19]: glued.groupby(level=0, axis=1).mean()
Out[19]: 
          A         B         C
0  0.829349 -0.445352 -0.452496
1  0.359166  0.472668 -0.398650
2  0.606573 -1.023059 -0.324488
3 -1.031761 -0.097903  1.018132

这篇关于使用Pandas在Python中平均来自多个数据文件的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆