获取 Pandas 数据框中每一列的唯一值 - 帮助我创建更小的更易于管理的数据框来执行指标 [英] getting the unique values of every column in a pandas dataframe - to help me create smaller more manageable dataframes to perform metrics on
问题描述
我开始想将 Pandas 数据帧中的一列转换为列表,然后获取唯一值,目的是在 for 循环中迭代这些唯一值,并创建一些较小的数据帧.IE.每个集群一个.然后我想将这些较小的数据帧存储在一个字典对象中.
I started off wanting to turn a column from a pandas dataframe into a list, and then get the unique values, with the aim of iterating over those unique values in a for loop, and creating a few smaller dataframes. I.e. one for each cluster. Then I want to store these smaller dataframes in a dictionary object.
@ben 建议我开始一个新问题并询问 Pandas 数据帧的 GroupBy 方法来执行此任务?
@ben suggested I start a new question and ask about the GroupBy Method of pandas dataframes to perform this task?
我的原帖在这里:从熊猫数据框列中获取列表>
My Data:
cluster load_date budget actual fixed_price
A 1/1/2014 1000 4000 Y
A 2/1/2014 12000 10000 Y
A 3/1/2014 36000 2000 Y
B 4/1/2014 15000 10000 N
B 4/1/2014 12000 11500 N
B 4/1/2014 90000 11000 N
C 7/1/2014 22000 18000 N
C 8/1/2014 30000 28960 N
C 9/1/2014 53000 51200 N
例如:对于 cluster_list 中的项目(其中集群列表是集群中唯一的一组值)
For example: for item in cluster_list(where cluster list is the unique set of values in cluster)
create a dataframe for cluster a, where budget > X etc
然后对其他集群做同样的事情,并将它们放入字典中.
Then do the same for the other clusters, and put them in a dictionary.
然后能够从字典中获取某个数据帧,只说预算 > X 的集群 B 的数据帧
Then be able to get a certain dataframe out of the dictionary, say only the dataframe for cluster B where budget > X
GetDf(key):
return dict(key)
提前致谢
推荐答案
这个问题有两个部分.首先,过滤那些预算小于的列.X:
There's two parts to this question. First, filter those columns where budget < X:
In [11]: df1 = df[df['budget'] > 10000]
In [12]: df1
Out[12]:
cluster load_date budget actual fixed_price
1 A 2/1/2014 12000 10000 Y
2 A 3/1/2014 36000 2000 Y
3 B 4/1/2014 15000 10000 N
4 B 4/1/2014 12000 11500 N
5 B 4/1/2014 90000 11000 N
6 C 7/1/2014 22000 18000 N
7 C 8/1/2014 30000 28960 N
8 C 9/1/2014 53000 51200 N
现在您可以按集群分组,并获取组:
Now you can groupby cluster, and get the groups:
In [13]: g = df1.groupby('cluster')
In [14]: g.get_group('A')
Out[14]:
cluster load_date budget actual fixed_price
1 A 2/1/2014 12000 10000 Y
2 A 3/1/2014 36000 2000 Y
注意:如果您真的想要一本字典,那么您可以使用:
Note: if you really want a dictionary then you can use:
In [15]: d = dict(iter(g))
In [16]: d['A']
Out[16]:
cluster load_date budget actual fixed_price
1 A 2/1/2014 12000 10000 Y
2 A 3/1/2014 36000 2000 Y
这篇关于获取 Pandas 数据框中每一列的唯一值 - 帮助我创建更小的更易于管理的数据框来执行指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!