Python Pandas:使用地图按范围创建组 [英] Python Pandas: Create Groups by Range using map
问题描述
我有一个大数据集,我希望根据总数的累计百分比来创建组.我已经通过使用 map 函数完成了这个工作,请参见下面的代码.如果我想让我的小组更加细化,有没有更好的方法来做到这一点?例如,现在我正在查看 5% 的增量……如果想要查看 1% 的增量怎么办.想知道是否有另一种方法可以让我不必将它们明确输入到我的codethem"函数中.
I have a large data set where I am looking to create groups based upon cumulative sum percent of the total. I have gotten this to work by using the map function see below code. Is there a better way to do this say if I wanted to make my groups even more granular? So for example now am looking at 5% increments...what if want to look at 1 % increments. Wondering if there is another way where I don't have to explicitly enter them into my "codethem" function.
def codethem(dl):
if dl < .05 : return '5'
elif .05 < dl <= .1: return '10'
elif .1 < dl <= .15: return '15'
elif .15 < dl <= .2: return '20'
elif .2 < dl <= .25: return '25'
elif .25 < dl <= .3: return '30'
elif .3 < dl <= .35: return '35'
elif .35 < dl <= .4: return '40'
elif .4 < dl <= .45: return '45'
elif .45 < dl <= .5: return '50'
elif .5 < dl <= .55: return '55'
elif .55 < dl <= .6: return '60'
elif .6 < dl <= .65: return '65'
elif .65 < dl <= .7: return '70'
elif .7 < dl <= .75: return '75'
elif .75 < dl <= .8: return '80'
elif .8 < dl <= .85: return '85'
elif .85 < dl <= .9: return '90'
elif .9 < dl <= .95: return '95'
elif .95 < dl <= 1: return '100'
else: return 'None'
my_df['code'] = my_df['sales_csum_aspercent'].map(code them)
谢谢!
推荐答案
有一个特殊的方法 - pd.cut()
there is a special method for that - pd.cut()
演示:
创建随机DF:
In [393]: df = pd.DataFrame({'a': np.random.rand(10)})
In [394]: df
Out[394]:
a
0 0.860256
1 0.399267
2 0.209185
3 0.773647
4 0.294845
5 0.883161
6 0.985758
7 0.559730
8 0.723033
9 0.126226
我们应该在调用pd.cut()
时指定bins:
we should specify bins when calling pd.cut()
:
In [404]: np.linspace(0, 1, 11)
Out[404]: array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
In [395]: pd.cut(df.a, bins=np.linspace(0, 1, 11))
Out[395]:
0 (0.8, 0.9]
1 (0.3, 0.4]
2 (0.2, 0.3]
3 (0.7, 0.8]
4 (0.2, 0.3]
5 (0.8, 0.9]
6 (0.9, 1]
7 (0.5, 0.6]
8 (0.7, 0.8]
9 (0.1, 0.2]
Name: a, dtype: category
Categories (10, object): [(0, 0.1] < (0.1, 0.2] < (0.2, 0.3] < (0.3, 0.4] ... (0.6, 0.7] < (0.7, 0.8] < (0.8, 0.9] < (0.9, 1]]
如果我们想要自定义标签,我们应该明确指定它们:
if we want to have a custom labels, we should explicitly specify them:
In [401]: bins = np.linspace(0,1, 11)
注意:bin 标签必须比 bin 边的数量少 1
NOTE: bin labels must be one fewer than the number of bin edges
In [402]: labels = (bins[1:]*100).astype(int)
In [412]: labels
Out[412]: array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
In [403]: pd.cut(df.a, bins=bins, labels=labels)
Out[403]:
0 90
1 40
2 30
3 80
4 30
5 90
6 100
7 60
8 80
9 20
Name: a, dtype: category
Categories (10, int64): [10 < 20 < 30 < 40 ... 70 < 80 < 90 < 100]
让我们用 5%
步骤来做
In [419]: bins = np.linspace(0, 1, 21)
In [420]: bins
Out[420]: array([ 0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 , 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.8
5, 0.9 , 0.95, 1. ])
In [421]: labels = (bins[1:]*100).astype(int)
In [422]: labels
Out[422]: array([ 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100])
In [423]: pd.cut(df.a, bins=bins, labels=labels)
Out[423]:
0 90
1 40
2 25
3 80
4 30
5 90
6 100
7 60
8 75
9 15
Name: a, dtype: category
Categories (20, int64): [5 < 10 < 15 < 20 ... 85 < 90 < 95 < 100]
这篇关于Python Pandas:使用地图按范围创建组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!