一个数据帧中的值是否在另一数据帧的bin中? [英] Are values in one dataframe in bins of another dataframe?

查看:72
本文介绍了一个数据帧中的值是否在另一数据帧的bin中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为loc_df的数据框,其中有两列垃圾箱,看起来像这样...

> loc_df

loc_x_bin        loc_y_bin      
(-20, -10]        (0, 50]           
(-140, -130]      (100, 150]        
(0,  10]          (-50, 0]          

我还有另一个数据框,叫做数据,看起来像这样……

> data

  loc_x         loc_y  
   -15            25
    30            35
    5            -45
   -135          -200

我想在数据中添加一个新的布尔列,以显示loc_x是否在loc_x_bin的值之内,而loc_y是否在数据帧loc_dfloc_y_bin之内. loc_xloc_y必须位于同一行的loc_x_binloc_y_bin中.例如:

> data

 loc_x          loc_y         in_bins
  -15             25             true
   30             35             false
   5             -45             true
  -135           -200            false
   5              25             false**

更新 **尽管5在(0,10] loc_x_bin之内,而25在(0,50] loc_y_bin之内,但loc_x_binloc_y_bin不在同一行中,所以我希望这是错误的. /p>

解决方案

UPDATE2::如果要检查两者 xy都属于垃圾箱在df_loc(或loc_df)的同一行中:

xstep = 10
ystep = 50

In [201]: (df.assign(bin=(pd.cut(df.loc_x, np.arange(-500, 500, xstep)).astype(str)
   .....:                 +
   .....:                 pd.cut(df.loc_y, np.arange(-500, 500, ystep)).astype(str)
   .....:                )
   .....:           )
   .....: )['bin'].isin(df_loc.sum(axis=1))
Out[201]:
0     True
1    False
2     True
3    False
4    False
Name: bin, dtype: bool

说明:

In [202]: (df.assign(bin=(pd.cut(df.loc_x, np.arange(-500, 500, xstep)).astype(str)
   .....:                 +
   .....:                 pd.cut(df.loc_y, np.arange(-500, 500, ystep)).astype(str)
   .....:                )
   .....:           )
   .....: )
Out[202]:
   loc_x  loc_y                       bin
0    -15     25         (-20, -10](0, 50]
1     30     35           (20, 30](0, 50]
2      5    -45           (0, 10](-50, 0]
3   -135   -200  (-140, -130](-250, -200]
4      5     25            (0, 10](0, 50]

In [203]: df_loc.sum(axis=1)
Out[203]:
0         (-20, -10](0, 50]
1    (-140, -130](100, 150]
2           (0, 10](-50, 0]
dtype: object

更新:如果要检查x是否属于loc_x_biny是否属于loc_y_bin(不一定来自df_loc中的同一行):

如果df_loc.dtypes在两列中均未显示category,则您可能需要先将类别转换为category dtype:

df_loc.loc_x_bin = df_loc.loc_x_bin.astype('category')
df_loc.loc_y_bin = df_loc.loc_y_bin.astype('category')

然后,您可以将df"即时" "中的列分类:

xstep = 10
ystep = 50

df['in_bins'] = (   (pd.cut(df.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
                    &
                    (pd.cut(df.loc_y, np.arange(-500, 500, ystep)).isin(df_loc.loc_y_bin))
                )

测试:

In [130]: df['in_bins'] = (   (pd.cut(df.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
   .....:                     &
   .....:                     (pd.cut(df.loc_y, np.arange(-500, 500, ystep)).isin(df_loc.loc_y_bin))
   .....:                 )

In [131]: df
Out[131]:
   loc_x  loc_y in_bins
0    -15     25    True
1     30     35   False
2      5    -45    True
3   -135   -200   False

I have a dataframe named loc_df with two columns of bins that looks like this...

> loc_df

loc_x_bin        loc_y_bin      
(-20, -10]        (0, 50]           
(-140, -130]      (100, 150]        
(0,  10]          (-50, 0]          

I have another dataframe called data that looks like this...

> data

  loc_x         loc_y  
   -15            25
    30            35
    5            -45
   -135          -200

I want to make a new boolean column in data that shows whether loc_x is within the values of loc_x_bin and loc_y is within loc_y_bin of the dataframe loc_df. loc_x and loc_y must be in loc_x_bin and loc_y_bin of the same row. For Example:

> data

 loc_x          loc_y         in_bins
  -15             25             true
   30             35             false
   5             -45             true
  -135           -200            false
   5              25             false**

UPDATE **Although 5 is within (0,10] loc_x_bin and 25 is within (0, 50] loc_y_bin, the loc_x_bin and loc_y_bin are not in the same row so I want this to be false.

解决方案

UPDATE2: if you want to check that both x and y belong to bins from the same row in df_loc (or loc_df):

xstep = 10
ystep = 50

In [201]: (df.assign(bin=(pd.cut(df.loc_x, np.arange(-500, 500, xstep)).astype(str)
   .....:                 +
   .....:                 pd.cut(df.loc_y, np.arange(-500, 500, ystep)).astype(str)
   .....:                )
   .....:           )
   .....: )['bin'].isin(df_loc.sum(axis=1))
Out[201]:
0     True
1    False
2     True
3    False
4    False
Name: bin, dtype: bool

Explanation:

In [202]: (df.assign(bin=(pd.cut(df.loc_x, np.arange(-500, 500, xstep)).astype(str)
   .....:                 +
   .....:                 pd.cut(df.loc_y, np.arange(-500, 500, ystep)).astype(str)
   .....:                )
   .....:           )
   .....: )
Out[202]:
   loc_x  loc_y                       bin
0    -15     25         (-20, -10](0, 50]
1     30     35           (20, 30](0, 50]
2      5    -45           (0, 10](-50, 0]
3   -135   -200  (-140, -130](-250, -200]
4      5     25            (0, 10](0, 50]

In [203]: df_loc.sum(axis=1)
Out[203]:
0         (-20, -10](0, 50]
1    (-140, -130](100, 150]
2           (0, 10](-50, 0]
dtype: object

UPDATE: if you want to check whether x belongs to loc_x_bin and y belongs to loc_y_bin (not necessarily from the same row in df_loc):

if df_loc.dtypes doesn't show category for both columns, then you may want to convert your categories to category dtype first:

df_loc.loc_x_bin = df_loc.loc_x_bin.astype('category')
df_loc.loc_y_bin = df_loc.loc_y_bin.astype('category')

then you can categorize your columns in the df "on the fly":

xstep = 10
ystep = 50

df['in_bins'] = (   (pd.cut(df.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
                    &
                    (pd.cut(df.loc_y, np.arange(-500, 500, ystep)).isin(df_loc.loc_y_bin))
                )

Test:

In [130]: df['in_bins'] = (   (pd.cut(df.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
   .....:                     &
   .....:                     (pd.cut(df.loc_y, np.arange(-500, 500, ystep)).isin(df_loc.loc_y_bin))
   .....:                 )

In [131]: df
Out[131]:
   loc_x  loc_y in_bins
0    -15     25    True
1     30     35   False
2      5    -45    True
3   -135   -200   False

这篇关于一个数据帧中的值是否在另一数据帧的bin中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆