Python - 根据列值将数据帧拆分为多个数据帧并用这些值命名它们 [英] Python - splitting dataframe into multiple dataframes based on column values and naming them with those values

查看:33
本文介绍了Python - 根据列值将数据帧拆分为多个数据帧并用这些值命名它们的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型数据集,列出了在全国不同地区销售的竞争对手产品.我希望通过使用这些新数据帧名称中的列值的迭代过程将这个数据帧拆分为其他几个基于区域的数据帧,以便我可以分别处理每个数据帧 - 例如按价格对每个地区的信息进行排序,以了解每个地区的市场情况.我给出了以下数据的简化版本:

I have a large dataset listing competitor products on sale in different regions across the country. I am looking to split this dataframe into several others based on the region via an iterative process using the column values within the names of those new dataframes, so that I can work with each separately - e.g. to sort information in each region by price to understand what the market looks like in each. I've given a simplified version of the data below:

Competitor  Region  ProductA  ProductB
Comp1       A       £10       £15
Comp1       B       £11       £16
Comp1       C       £11       £15
Comp2       A       £9        £16
Comp2       B       £12       £14
Comp2       C       £14       £17
Comp3       A       £11       £16
Comp3       B       £10       £15
Comp3       C       £12       £15

我可以使用以下方法创建区域列表:

I can create a list of the regions using the below:

region_list=df['Region'].unique().tolist()

我希望在产生大量数据帧的迭代循环中使用,例如

Which I was hoping to use in an iterative loop that produced a number of dataframes, e.g.

df_A :

Competitor  Region  ProductA  ProductB
Comp1       A       £10       £15
Comp2       A       £9        £16
Comp3       A       £11       £16

我可以使用代码为每个区域手动执行此操作

I could do this manually for each region, with the code

df_A=df.loc[df['Region']==A]

但实际情况是该数据集有大量区域,这会使该代码变得乏味.有没有办法创建一个迭代循环来复制这个?有一个类似的问题,询问拆分数据帧,但答案没有显示如何根据每个列值标记输出.

but the reality is that this dataset has a large number of areas which would make this code tedious. Is there a way of creating an iterative loop that would replicate this? There is a similar question that asks about splitting dataframes, but the answer does not show how to label outputs based on each column value.

我对 Python 还很陌生并且仍在学习,所以如果实际上有一种不同的、更明智的方法来解决这个问题,我非常愿意接受建议.

I'm quite new to Python and still learning, so if there is actually a different, more sensible method of approaching this problem I'm very open to suggestions.

推荐答案

通过不同的值进行子集化称为 groupby,如果只是想用 for 遍历组> 循环,语法为:

Subsetting by distinct values is called a groupby, if simply want to iterate through the groups with a for loop, the syntax is:

for region, df_region in df.groupby('Region'):
    print(df_region)

  Competitor Region ProductA ProductB
0      Comp1      A      £10      £15
3      Comp2      A       £9      £16
6      Comp3      A      £11      £16
  Competitor Region ProductA ProductB
1      Comp1      B      £11      £16
4      Comp2      B      £12      £14
7      Comp3      B      £10      £15
  Competitor Region ProductA ProductB
2      Comp1      C      £11      £15
5      Comp2      C      £14      £17
8      Comp3      C      £12      £15

这篇关于Python - 根据列值将数据帧拆分为多个数据帧并用这些值命名它们的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆