Python Pandas从列表中创建多个数据帧 [英] Python Pandas Create Multiple dataframes from list

查看:481
本文介绍了Python Pandas从列表中创建多个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将此作为快速起点;

http://pandas.pydata.org/pandas-docs/stable/reshaping.html

In [1]: df
Out[1]: 
         date variable     value
0  2000-01-03        A  0.469112
1  2000-01-04        A -0.282863
2  2000-01-05        A -1.509059
3  2000-01-03        B -1.135632
4  2000-01-04        B  1.212112
5  2000-01-05        B -0.173215
6  2000-01-03        C  0.119209
7  2000-01-04        C -1.044236
8  2000-01-05        C -0.861849
9  2000-01-03        D -2.104569
10 2000-01-04        D -0.494929
11 2000-01-05        D  1.071804

然后隔离'A'给出:

In [2]: df[df['variable'] == 'A']
Out[2]: 
        date variable     value
0 2000-01-03        A  0.469112
1 2000-01-04        A -0.282863
2 2000-01-05        A -1.509059

现在创建新的数据帧be:

Now creating new dataframe would be:

dfA = df[df['variable'] == 'A'] 

让我们说B将是:

dfB = df[df['variable'] == 'B'] 

所以,隔离数据帧分为dfA,dfB,dfC ......

So, Isolating the dataframes into dfA, dfB, dfC......

dfList  = list(set(df['variable']))
dfNames = ["df" + row for row in dfList]  

for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    dfNames[i] = dfNew      

它运行......但是当尝试 dfA 我得到输出dfA未定义

It runs... But when try dfA I get output "dfA" is not defined

推荐答案

要按字面意思回答你的问题, globals()['dfA'] = dfNew 将在全局命名空间中定义 dfA

To answer your question literally, globals()['dfA'] = dfNew would define dfA in the global namespace:

for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    globals()[dfName] = dfNew   

但是,从来没有好的定义动态命名变量的原因。

However, there is never a good reason to define dynamically-named variables.


  • 如果直到运行时才知道名称 - 也就是名称是真正
    动态 - 然后您不能在代码中使用这些名称,因为您的代码在运行时之前要编写
    。那么如果您不能在代码中引用它,那么创建一个名为
    dfA 的变量有什么意义呢?

  • If the names are not known until runtime -- that is, if the names are truly dynamic -- then you you can't use the names in your code since your code has to be written before runtime. So what's the point of creating a variable named dfA if you can't refer to it in your code?

另一方面,如果您事先知道您将拥有一个名为 dfA 的变量
,那么您的代码不是非常有活力。您有静态变量名称。
使用循环的唯一原因是减少锅炉板代码。但是,即使在这种情况下,
也有更好的选择。
解决方案是使用dict(见下文)或列出 1

If, on the other hand, you know before hand that you will have a variable named dfA, then your code isn't really dynamic. You have static variable names. The only reason to use the loop is to cut down on boiler-plate code. However, even in this case, there is a better alternative. The solution is to use a dict (see below) or list1.

添加动态命名变量pollutes全局命名空间。

Adding dynamically-named variables pollutes the global namespace.

它没有很好地概括。如果您有100个动态命名的变量,那么您将如何访问
如何循环播放?

It does not generalize well. If you had 100 dynamically named variables, how would you access them? How would you loop over them?

To 管理动态命名的变量,你需要将它们的
名称列表保存为字符串:例如 ['dfA','dfB','dfC',...] 然后通过访问新的
铸造的全局变量globals()
dict:例如全局()[ DFA] 。那个
很尴尬。

To "manage" dynamically named variables you would need to keep a list of their names as strings: e.g. ['dfA', 'dfB', 'dfC',...] and then accessed the newly minted global variables via the globals() dict: e.g. globals()['dfA']. That is awkward.

所以程序员通过痛苦的经验得出的结论是
动态 - 名称变量介于尴尬和无用之间,并且
更加令人愉快,强大,实用,可以在键盘中存储键/值对。变量的
名称成为dict中的键,变量
的值变为与键关联的值。所以,你没有一个简单的名字 dfA
你将有一个dict dfs ,你可以访问 dfA DataFrame通过
dfs ['dfA']

So the conclusion programmers reach through bitter experience is that dynamically-named variables are somewhere between awkward and useless and it is much more pleasant, powerful, practical to store key/value pairs in a dict. The name of the variable becomes a key in the dict, and the value of the variable becomes the value associated with the key. So, instead of having a bare name dfA you would have a dict dfs and you would access the dfA DataFrame via dfs['dfA']:

dfs = dict()
for i, row in enumerate(dfList):
    dfName = dfNames[i]
    dfNew = df[df['variable'] == row]
    dfs[dfName] = dfNew   

或者,如李建勋秀

dfs = {k: g for k, g in df.groupby('variable')}

这就是为什么Jon Clements和Jianxun Li通过显示定义动态命名变量的
替代方案来回答你的问题。这是因为我们所有
都认为这是一个糟糕的主意。

This is why Jon Clements and Jianxun Li answered your question by showing alternatives to defining dynamically-named variables. It's because we all believe it is a terrible idea.

使用Jianxun Li的解决方案,循环遍历< a href =https://docs.python.org/3/library/stdtypes.html#dict.items =nofollow noreferrer> dict的键/值对,然后你可以使用:

Using Jianxun Li's solution, to loop over a dict's key/value pairs you could then use:

dfs = {k: g for k, g in df.groupby('variable')}
for key, df in dfs.items():
    ...

或使用Jon Clements的解决方案,要遍历群组,您可以使用:

or using Jon Clements' solution, to iterate through groups you could use:

grouped = df.groupby('variable')
for key, df in grouped:
    ...






1 如果名称已编号或已订购,您可以使用列表而不是字典。


1If the names are numbered or ordered you could use a list instead of a dict.

这篇关于Python Pandas从列表中创建多个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆