根据条件对变量进行分组 [英] Grouping variables based on conditions

查看:150
本文介绍了根据条件对变量进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将以下数据分为64组。每个对象都有两个变量x和y。我想根据条件将它们分组。 x和y的范围都在0到2000之间,我想将它们分为64组。第一个具有x <250且y <250的下一个250 x $ 250 10100
227060
35501000
46581900
5364810
6 74 1890
...
6000 64 71

您能告诉我一种方法吗?我现在将数据作为数据框使用,但是我不知道它是否可行。一些同事告诉我,避免在数据帧中使用循环。我还附上一张散点图的图片,这可能有助于为您可视化我的数据。



解决方案

使用 pd.cut()将变量绑定到 x -和 y -类别,然后根据一些逻辑来构造它们的组(取决于您是否想要特定的顺序,下面的代码简单地从下至上,从左至右对单元格进行排序)

  bins = [250 * i for range(9)中的i] 
标签= list(range(8) )
df ['x_bin'] = pd.cut(df ['x'],bins,labels = labels)
df ['y_bin'] = pd.cut(df ['y'] ,bins,labels = labels)
df ['group'] = df ['x_bin']。astype(np.int8)+ df ['y_bin']。astype(np.int8).multiply(8)

请注意 .astype(np.int8) -calls是一种变通方法,可以使用 pandas.Series 进行基本数学运算。如果您不想存储中间装箱分配,则可以通过将我的最后一行中的列引用替换为前几行中的分配,在一行中完成所有这些操作:

  df ['group'] = pd.cut(df ['x'],bins,labels = labels).astype(np.int8)+ pd.cut(df ['y'],垃圾箱,标签=标签).astype(np.int8).multiply(8)


Grouping the following data in 64 groups. I have two variables x and y for each object. I would like to group them up based on a condition. Both x and y have a range between 0 and 2000 and I want to break them into 64 groups. The first one to have x<250 and y<250 the next one 250

Sample data:
index x y
1     10 100
2     270 60
3     550 1000
4     658 1900
5     364 810 
6     74  1890
...
6000  64  71

Could you please tell me a way to do it? I have my data now as a data frame but I do not know if it the way to go. I was told by some colleagues to avoid using loops in data frames. I attached also a picture of how my scatterplot looks like, it could be helpful to visualize my data for you. Thank you in advance!

解决方案

Use pd.cut() to bin your variables to x- and y-categories and then construct their group according to some logic (depending on if you want a specific order, my code below simply orders the cells from bottom to top and left to right)

bins = [250 * i for i in range(9)]
labels = list(range(8))
df['x_bin'] = pd.cut(df['x'], bins, labels=labels)
df['y_bin'] = pd.cut(df['y'], bins, labels=labels)
df['group'] = df['x_bin'].astype(np.int8) + df['y_bin'].astype(np.int8).multiply(8)

Note that the .astype(np.int8)-calls are a workaround to allow for basic math with pandas.Series. If you don't want to store the intermediate binning assignments, all of this could be done in one line by substituting the column references in my last line for the assignments in the prior lines:

df['group'] = pd.cut(df['x'], bins, labels=labels).astype(np.int8) + pd.cut(df['y'], bins, labels=labels).astype(np.int8).multiply(8)

这篇关于根据条件对变量进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆