Python-根据列值将数据框分为多个数据框,并用这些值命名 [英] Python - splitting dataframe into multiple dataframes based on column values and naming them with those values

查看:146
本文介绍了Python-根据列值将数据框分为多个数据框,并用这些值命名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个庞大的数据集,列出了在全国不同地区出售的竞争对手产品.我希望通过使用这些新数据框名称中的列值的迭代过程,根据区域将该数据框拆分为几个其他数据框,以便我可以分别处理每个数据框-例如根据价格对每个地区的信息进行排序,以了解每个地区的市场情况.我给了以下数据的简化版本:

I have a large dataset listing competitor products on sale in different regions across the country. I am looking to split this dataframe into several others based on the region via an iterative process using the column values within the names of those new dataframes, so that I can work with each separately - e.g. to sort information in each region by price to understand what the market looks like in each. I've given a simplified version of the data below:

Competitor  Region  ProductA  ProductB
Comp1       A       £10       £15
Comp1       B       £11       £16
Comp1       C       £11       £15
Comp2       A       £9        £16
Comp2       B       £12       £14
Comp2       C       £14       £17
Comp3       A       £11       £16
Comp3       B       £10       £15
Comp3       C       £12       £15

我可以使用以下内容创建区域列表:

I can create a list of the regions using the below:

region_list=df['Region'].unique().tolist()

我希望在产生大量数据帧的迭代循环中使用

Which I was hoping to use in an iterative loop that produced a number of dataframes, e.g.

df_A :

Competitor  Region  ProductA  ProductB
Comp1       A       £10       £15
Comp2       A       £9        £16
Comp3       A       £11       £16

我可以使用代码

df_A=df.loc[df['Region']==A]

但是现实是该数据集具有大量区域,这会使此代码变得乏味.有没有一种方法可以创建一个可以复制此内容的迭代循环?有一个类似的问题询问有关拆分数据帧的问题,但答案并未显示如何根据每个列的值来标记输出.

but the reality is that this dataset has a large number of areas which would make this code tedious. Is there a way of creating an iterative loop that would replicate this? There is a similar question that asks about splitting dataframes, but the answer does not show how to label outputs based on each column value.

我对Python还是很陌生,现在仍在学习,因此,如果实际上有另一种更明智的方法来解决此问题,那么我很乐意提出建议.

I'm quite new to Python and still learning, so if there is actually a different, more sensible method of approaching this problem I'm very open to suggestions.

推荐答案

通过不同的值进行子集称为groupby,如果只是想通过for循环遍历各组,则语法为:

Subsetting by distinct values is called a groupby, if simply want to iterate through the groups with a for loop, the syntax is:

for region, df_region in df.groupby('Region'):
    print(df_region)

  Competitor Region ProductA ProductB
0      Comp1      A      £10      £15
3      Comp2      A       £9      £16
6      Comp3      A      £11      £16
  Competitor Region ProductA ProductB
1      Comp1      B      £11      £16
4      Comp2      B      £12      £14
7      Comp3      B      £10      £15
  Competitor Region ProductA ProductB
2      Comp1      C      £11      £15
5      Comp2      C      £14      £17
8      Comp3      C      £12      £15

这篇关于Python-根据列值将数据框分为多个数据框,并用这些值命名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆