Python:基于 Pandas 中 2 列的分箱 [英] Python: Binning based on 2 columns in Pandas

查看:81
本文介绍了Python:基于 Pandas 中 2 列的分箱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Pandas 中寻找一种基于 2 列的快速优雅的 bin 方法.

Looking for a quick and elegant way to bin based on 2 columns in Pandas.

这是我的数据框

                              filename  height   width
0        shopfronts_23092017_3_285.jpg   750.0   560.0
1                   shopfronts_200.jpg   4395.0  6020.0
2  shopfronts_25092017_eateries_98.jpg   414.0   621.0
3                   shopfronts_101.jpg   480.0   640.0
4                   shopfronts_138.jpg   3733.0  8498.0
5  shopfronts_25092017_eateries_95.jpg   187.0   250.0
6      shopfronts_25092017_neon_33.jpg   100.0   200.0
7                   shopfronts_322.jpg   682.0  1024.0
8                   shopfronts_171.jpg   800.0   600.0
9         shopfronts_23092017_3_35.jpg   120.0   210.0

我需要根据 2 列高度对记录进行分类宽度(图像分辨率)

I need to bin the records based on 2 columns height & width (image resolutions)

我正在寻找这样的东西

                              filename  height   width    group
0        shopfronts_23092017_3_285.jpg   750.0   560.0       g3 
1                   shopfronts_200.jpg   4395.0  6020.0      g4  
2  shopfronts_25092017_eateries_98.jpg   414.0   621.0   others
3                   shopfronts_101.jpg   480.0   640.0   others
4                   shopfronts_138.jpg   3733.0  8498.0      g4
5  shopfronts_25092017_eateries_95.jpg   187.0   250.0       g1
6      shopfronts_25092017_neon_33.jpg   100.0   200.0       g1
7                   shopfronts_322.jpg   682.0  1024.0   others
8                   shopfronts_171.jpg   800.0   600.0       g3
9         shopfronts_23092017_3_35.jpg   120.0   210.0       g1

where 

g1: <= 400x300]
g2: (400x300, 640x480]
g3: (640x480, 800x600]
g4: > 800x600
others: If they don't comply to the requirement (Ex: records 7,2,3 - either height or width will fall in the categories defined but not both)

希望使用组列获取频率计数.如果这不是最好的方法,如果有更好的方法,请告诉我.

Looking to get the frequency count using group column. If this is not the best way to go about it and if there is a better way, kindly let me know.

推荐答案

Using np.where

In [4510]: df['group'] = np.where((df.height <= 400) & (df.width <= 300),
      ...:          'g1',
      ...:          np.where((df.height <= 640) & (df.width <= 480),
      ...:          'g2',
      ...:          np.where((df.height <= 800) & (df.width <= 600),
      ...:          'g3',
      ...:          np.where((df.height > 800) & (df.width > 600),
      ...:          'g4',
      ...:          'others'))))

In [4511]: df
Out[4511]:
                              filename  height   width   group
0        shopfronts_23092017_3_285.jpg   750.0   560.0      g3
1                   shopfronts_200.jpg  4395.0  6020.0      g4
2  shopfronts_25092017_eateries_98.jpg   414.0   621.0  others
3                   shopfronts_101.jpg   480.0   640.0  others
4                   shopfronts_138.jpg  3733.0  8498.0      g4
5  shopfronts_25092017_eateries_95.jpg   187.0   250.0      g1
6      shopfronts_25092017_neon_33.jpg   100.0   200.0      g1
7                   shopfronts_322.jpg   682.0  1024.0  others
8                   shopfronts_171.jpg   800.0   600.0      g3
9         shopfronts_23092017_3_35.jpg   120.0   210.0      g1

这篇关于Python:基于 Pandas 中 2 列的分箱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆