Python Pandas从列表中创建多个数据帧 [英] Python Pandas Create Multiple dataframes from list
问题描述
将此作为快速起点;
http://pandas.pydata.org/pandas-docs/stable/reshaping.html
In [1]: df
Out[1]:
date variable value
0 2000-01-03 A 0.469112
1 2000-01-04 A -0.282863
2 2000-01-05 A -1.509059
3 2000-01-03 B -1.135632
4 2000-01-04 B 1.212112
5 2000-01-05 B -0.173215
6 2000-01-03 C 0.119209
7 2000-01-04 C -1.044236
8 2000-01-05 C -0.861849
9 2000-01-03 D -2.104569
10 2000-01-04 D -0.494929
11 2000-01-05 D 1.071804
然后隔离'A'给出:
In [2]: df[df['variable'] == 'A']
Out[2]:
date variable value
0 2000-01-03 A 0.469112
1 2000-01-04 A -0.282863
2 2000-01-05 A -1.509059
现在创建新的数据帧be:
Now creating new dataframe would be:
dfA = df[df['variable'] == 'A']
让我们说B将是:
dfB = df[df['variable'] == 'B']
所以,隔离数据帧分为dfA,dfB,dfC ......
So, Isolating the dataframes into dfA, dfB, dfC......
dfList = list(set(df['variable']))
dfNames = ["df" + row for row in dfList]
for i, row in enumerate(dfList):
dfName = dfNames[i]
dfNew = df[df['variable'] == row]
dfNames[i] = dfNew
它运行......但是当尝试 dfA
我得到输出dfA未定义
It runs... But when try dfA
I get output "dfA" is not defined
推荐答案
要按字面意思回答你的问题, globals()['dfA'] = dfNew
将在全局命名空间中定义 dfA
:
To answer your question literally, globals()['dfA'] = dfNew
would define dfA
in the global namespace:
for i, row in enumerate(dfList):
dfName = dfNames[i]
dfNew = df[df['variable'] == row]
globals()[dfName] = dfNew
但是,从来没有好的定义动态命名变量的原因。
However, there is never a good reason to define dynamically-named variables.
-
如果直到运行时才知道名称 - 也就是名称是真正
动态 - 然后您不能在代码中使用这些名称,因为您的代码在运行时之前要编写
。那么如果您不能在代码中引用它,那么创建一个名为
dfA
的变量有什么意义呢?
If the names are not known until runtime -- that is, if the names are truly dynamic -- then you you can't use the names in your code since your code has to be written before runtime. So what's the point of creating a variable named
dfA
if you can't refer to it in your code?
另一方面,如果您事先知道您将拥有一个名为 dfA
的变量
,那么您的代码不是非常有活力。您有静态变量名称。
使用循环的唯一原因是减少锅炉板代码。但是,即使在这种情况下,
也有更好的选择。
解决方案是使用dict(见下文)或列出 1 。
If, on the other hand, you know before hand that you will have a variable
named dfA
, then your code isn't really dynamic. You have static variable names.
The only reason to use the loop is to cut down on boiler-plate code. However,
even in this case, there is a better alternative.
The solution is to use a dict (see below) or list1.
添加动态命名变量pollutes全局命名空间。
Adding dynamically-named variables pollutes the global namespace.
它没有很好地概括。如果您有100个动态命名的变量,那么您将如何访问
? 如何循环播放?
It does not generalize well. If you had 100 dynamically named variables, how would you access them? How would you loop over them?
To 管理动态命名的变量,你需要将它们的
名称列表保存为字符串:例如 ['dfA','dfB','dfC',...]
然后通过访问新的
dict:例如
铸造的全局变量globals()全局()[ DFA]
。那个
很尴尬。
To "manage" dynamically named variables you would need to keep a list of their
names as strings: e.g. ['dfA', 'dfB', 'dfC',...]
and then accessed the newly
minted global variables via the globals()
dict: e.g. globals()['dfA']
. That
is awkward.
所以程序员通过痛苦的经验得出的结论是
动态 - 名称变量介于尴尬和无用之间,并且
更加令人愉快,强大,实用,可以在键盘中存储键/值对。变量的
名称成为dict中的键,变量
的值变为与键关联的值。所以,你没有一个简单的名字 dfA
你将有一个dict dfs
,你可以访问 dfA
DataFrame通过
dfs ['dfA']
:
So the conclusion programmers reach through bitter experience is that
dynamically-named variables are somewhere between awkward and useless and it is
much more pleasant, powerful, practical to store key/value pairs in a dict. The
name of the variable becomes a key in the dict, and the value of the variable
becomes the value associated with the key. So, instead of having a bare name dfA
you would have a dict dfs
and you would access the dfA
DataFrame via
dfs['dfA']
:
dfs = dict()
for i, row in enumerate(dfList):
dfName = dfNames[i]
dfNew = df[df['variable'] == row]
dfs[dfName] = dfNew
或者,如李建勋秀,
dfs = {k: g for k, g in df.groupby('variable')}
这就是为什么Jon Clements和Jianxun Li通过显示定义动态命名变量的
替代方案来回答你的问题。这是因为我们所有
都认为这是一个糟糕的主意。
This is why Jon Clements and Jianxun Li answered your question by showing alternatives to defining dynamically-named variables. It's because we all believe it is a terrible idea.
使用Jianxun Li的解决方案,循环遍历< a href =https://docs.python.org/3/library/stdtypes.html#dict.items =nofollow noreferrer> dict的键/值对,然后你可以使用:
Using Jianxun Li's solution, to loop over a dict's key/value pairs you could then use:
dfs = {k: g for k, g in df.groupby('variable')}
for key, df in dfs.items():
...
或使用Jon Clements的解决方案,要遍历群组,您可以使用:
or using Jon Clements' solution, to iterate through groups you could use:
grouped = df.groupby('variable')
for key, df in grouped:
...
1 如果名称已编号或已订购,您可以使用列表而不是字典。
1If the names are numbered or ordered you could use a list instead of a dict.
这篇关于Python Pandas从列表中创建多个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!