Python-根据列值将数据框分为多个数据框,并用这些值命名 [英] Python - splitting dataframe into multiple dataframes based on column values and naming them with those values
问题描述
我有一个庞大的数据集,列出了在全国不同地区出售的竞争对手产品.我希望通过使用这些新数据框名称中的列值的迭代过程,根据区域将该数据框拆分为几个其他数据框,以便我可以分别处理每个数据框-例如根据价格对每个地区的信息进行排序,以了解每个地区的市场情况.我给了以下数据的简化版本:
I have a large dataset listing competitor products on sale in different regions across the country. I am looking to split this dataframe into several others based on the region via an iterative process using the column values within the names of those new dataframes, so that I can work with each separately - e.g. to sort information in each region by price to understand what the market looks like in each. I've given a simplified version of the data below:
Competitor Region ProductA ProductB
Comp1 A £10 £15
Comp1 B £11 £16
Comp1 C £11 £15
Comp2 A £9 £16
Comp2 B £12 £14
Comp2 C £14 £17
Comp3 A £11 £16
Comp3 B £10 £15
Comp3 C £12 £15
我可以使用以下内容创建区域列表:
I can create a list of the regions using the below:
region_list=df['Region'].unique().tolist()
我希望在产生大量数据帧的迭代循环中使用
Which I was hoping to use in an iterative loop that produced a number of dataframes, e.g.
df_A :
Competitor Region ProductA ProductB
Comp1 A £10 £15
Comp2 A £9 £16
Comp3 A £11 £16
我可以使用代码
df_A=df.loc[df['Region']==A]
但是现实是该数据集具有大量区域,这会使此代码变得乏味.有没有一种方法可以创建一个可以复制此内容的迭代循环?有一个类似的问题询问有关拆分数据帧的问题,但答案并未显示如何根据每个列的值来标记输出.
but the reality is that this dataset has a large number of areas which would make this code tedious. Is there a way of creating an iterative loop that would replicate this? There is a similar question that asks about splitting dataframes, but the answer does not show how to label outputs based on each column value.
我对Python还是很陌生,现在仍在学习,因此,如果实际上有另一种更明智的方法来解决此问题,那么我很乐意提出建议.
I'm quite new to Python and still learning, so if there is actually a different, more sensible method of approaching this problem I'm very open to suggestions.
推荐答案
通过不同的值进行子集称为groupby
,如果只是想通过for
循环遍历各组,则语法为:
Subsetting by distinct values is called a groupby
, if simply want to iterate through the groups with a for
loop, the syntax is:
for region, df_region in df.groupby('Region'):
print(df_region)
Competitor Region ProductA ProductB
0 Comp1 A £10 £15
3 Comp2 A £9 £16
6 Comp3 A £11 £16
Competitor Region ProductA ProductB
1 Comp1 B £11 £16
4 Comp2 B £12 £14
7 Comp3 B £10 £15
Competitor Region ProductA ProductB
2 Comp1 C £11 £15
5 Comp2 C £14 £17
8 Comp3 C £12 £15
这篇关于Python-根据列值将数据框分为多个数据框,并用这些值命名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!