一个数据帧中的值是否在另一数据帧的bin中? [英] Are values in one dataframe in bins of another dataframe?
问题描述
我有一个名为loc_df
的数据框,其中有两列垃圾箱,看起来像这样...
> loc_df
loc_x_bin loc_y_bin
(-20, -10] (0, 50]
(-140, -130] (100, 150]
(0, 10] (-50, 0]
我还有另一个数据框,叫做数据,看起来像这样……
> data
loc_x loc_y
-15 25
30 35
5 -45
-135 -200
我想在数据中添加一个新的布尔列,以显示loc_x
是否在loc_x_bin
的值之内,而loc_y
是否在数据帧loc_df
的loc_y_bin
之内. loc_x
和loc_y
必须位于同一行的loc_x_bin
和loc_y_bin
中.例如:
> data
loc_x loc_y in_bins
-15 25 true
30 35 false
5 -45 true
-135 -200 false
5 25 false**
更新
**尽管5在(0,10] loc_x_bin
之内,而25在(0,50] loc_y_bin
之内,但loc_x_bin
和loc_y_bin
不在同一行中,所以我希望这是错误的. /p>
UPDATE2::如果要检查两者 x
和y
都属于垃圾箱在df_loc
(或loc_df
)的同一行中:
xstep = 10
ystep = 50
In [201]: (df.assign(bin=(pd.cut(df.loc_x, np.arange(-500, 500, xstep)).astype(str)
.....: +
.....: pd.cut(df.loc_y, np.arange(-500, 500, ystep)).astype(str)
.....: )
.....: )
.....: )['bin'].isin(df_loc.sum(axis=1))
Out[201]:
0 True
1 False
2 True
3 False
4 False
Name: bin, dtype: bool
说明:
In [202]: (df.assign(bin=(pd.cut(df.loc_x, np.arange(-500, 500, xstep)).astype(str)
.....: +
.....: pd.cut(df.loc_y, np.arange(-500, 500, ystep)).astype(str)
.....: )
.....: )
.....: )
Out[202]:
loc_x loc_y bin
0 -15 25 (-20, -10](0, 50]
1 30 35 (20, 30](0, 50]
2 5 -45 (0, 10](-50, 0]
3 -135 -200 (-140, -130](-250, -200]
4 5 25 (0, 10](0, 50]
In [203]: df_loc.sum(axis=1)
Out[203]:
0 (-20, -10](0, 50]
1 (-140, -130](100, 150]
2 (0, 10](-50, 0]
dtype: object
更新:如果要检查x
是否属于loc_x_bin
和y
是否属于loc_y_bin
(不一定来自df_loc
中的同一行):>
如果df_loc.dtypes
在两列中均未显示category
,则您可能需要先将类别转换为category
dtype:
df_loc.loc_x_bin = df_loc.loc_x_bin.astype('category')
df_loc.loc_y_bin = df_loc.loc_y_bin.astype('category')
然后,您可以将df
"即时" "中的列分类:
xstep = 10
ystep = 50
df['in_bins'] = ( (pd.cut(df.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
&
(pd.cut(df.loc_y, np.arange(-500, 500, ystep)).isin(df_loc.loc_y_bin))
)
测试:
In [130]: df['in_bins'] = ( (pd.cut(df.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
.....: &
.....: (pd.cut(df.loc_y, np.arange(-500, 500, ystep)).isin(df_loc.loc_y_bin))
.....: )
In [131]: df
Out[131]:
loc_x loc_y in_bins
0 -15 25 True
1 30 35 False
2 5 -45 True
3 -135 -200 False
I have a dataframe named loc_df
with two columns of bins that looks like this...
> loc_df
loc_x_bin loc_y_bin
(-20, -10] (0, 50]
(-140, -130] (100, 150]
(0, 10] (-50, 0]
I have another dataframe called data that looks like this...
> data
loc_x loc_y
-15 25
30 35
5 -45
-135 -200
I want to make a new boolean column in data that shows whether loc_x
is within the values of loc_x_bin
and loc_y
is within loc_y_bin
of the dataframe loc_df
. loc_x
and loc_y
must be in loc_x_bin
and loc_y_bin
of the same row. For Example:
> data
loc_x loc_y in_bins
-15 25 true
30 35 false
5 -45 true
-135 -200 false
5 25 false**
UPDATE
**Although 5 is within (0,10] loc_x_bin
and 25 is within (0, 50] loc_y_bin
, the loc_x_bin
and loc_y_bin
are not in the same row so I want this to be false.
UPDATE2: if you want to check that both x
and y
belong to bins from the same row in df_loc
(or loc_df
):
xstep = 10
ystep = 50
In [201]: (df.assign(bin=(pd.cut(df.loc_x, np.arange(-500, 500, xstep)).astype(str)
.....: +
.....: pd.cut(df.loc_y, np.arange(-500, 500, ystep)).astype(str)
.....: )
.....: )
.....: )['bin'].isin(df_loc.sum(axis=1))
Out[201]:
0 True
1 False
2 True
3 False
4 False
Name: bin, dtype: bool
Explanation:
In [202]: (df.assign(bin=(pd.cut(df.loc_x, np.arange(-500, 500, xstep)).astype(str)
.....: +
.....: pd.cut(df.loc_y, np.arange(-500, 500, ystep)).astype(str)
.....: )
.....: )
.....: )
Out[202]:
loc_x loc_y bin
0 -15 25 (-20, -10](0, 50]
1 30 35 (20, 30](0, 50]
2 5 -45 (0, 10](-50, 0]
3 -135 -200 (-140, -130](-250, -200]
4 5 25 (0, 10](0, 50]
In [203]: df_loc.sum(axis=1)
Out[203]:
0 (-20, -10](0, 50]
1 (-140, -130](100, 150]
2 (0, 10](-50, 0]
dtype: object
UPDATE: if you want to check whether x
belongs to loc_x_bin
and y
belongs to loc_y_bin
(not necessarily from the same row in df_loc
):
if df_loc.dtypes
doesn't show category
for both columns, then you may want to convert your categories to category
dtype first:
df_loc.loc_x_bin = df_loc.loc_x_bin.astype('category')
df_loc.loc_y_bin = df_loc.loc_y_bin.astype('category')
then you can categorize your columns in the df
"on the fly":
xstep = 10
ystep = 50
df['in_bins'] = ( (pd.cut(df.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
&
(pd.cut(df.loc_y, np.arange(-500, 500, ystep)).isin(df_loc.loc_y_bin))
)
Test:
In [130]: df['in_bins'] = ( (pd.cut(df.loc_x, np.arange(-500, 500, xstep)).isin(df_loc.loc_x_bin))
.....: &
.....: (pd.cut(df.loc_y, np.arange(-500, 500, ystep)).isin(df_loc.loc_y_bin))
.....: )
In [131]: df
Out[131]:
loc_x loc_y in_bins
0 -15 25 True
1 30 35 False
2 5 -45 True
3 -135 -200 False
这篇关于一个数据帧中的值是否在另一数据帧的bin中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!