检查一个系列是否是 Pandas 中另一个系列的子集 [英] Check if one series is subset of another in Pandas
问题描述
我有来自 2 个不同数据框的 2 列.我想检查第 1 列是否是第 2 列的子集.
I have 2 columns from 2 different dataframes. I want to check if column 1 is a subset of column 2.
我使用了以下代码:
set(col1).issubset(set(col2))
这里的问题是,如果 col1 只有整数而 col2 既有整数又有字符串,则返回 false.发生这种情况是因为 col2 的元素被强制转换为字符串.例如,
The issue with this is that if col1 has only integers and col2 has both integers and strings, then this returns false. This happens because elements of col2 are coerced into strings. For example,
set([376, 264, 365, 302]) &
set(['302', 'water', 'nist1950', '264', '365', '376'])
我尝试使用 pandas 中的 isin
.但是如果 col1 和 col2 是系列,那么这会给出一系列布尔值.我想要True or False
.
I tried using isin
from pandas. But if col1 and col2 are series then this gives a series of Boolean values. I want True or False
.
我该如何解决这个问题?有没有我遗漏的更简单的功能?
How do I solve this? Is there a simpler function that I have missed?
编辑 1
添加示例.
col1
0 365
1 376
2 302
3 264
Name: subject, dtype: int64
col2
0 nist1950
1 nist1950
2 water
3 water
4 376
5 376
6 302
7 302
8 365
9 365
10 264
11 264
12 376
13 376
Name: subject, dtype: object
编辑 2
col1 和 col2 可以有整数、字符串、浮点数等.我不想对这些列中的内容做出任何预先判断.
col1 and col2 can have integers, strings, floats etc. I would like to not make any prejudgement about what is in these columns.
推荐答案
您可以使用 .pydata.org/pandas-docs/stable/generated/pandas.Series.all.html" rel="noreferrer">all
检查你的col1
元素包含在 col2
中.要转换为数字,您可以使用 pd.to_numeric
:
You could use isin
with all
to check whether all of your col1
elements contains in col2
. For converting to numeric you could use pd.to_numeric
:
s1 = pd.Series([376, 264, 365, 302])
s2 = pd.Series(['302', 'water', 'nist1950', '264', '365', '376'])
res = s1.isin(pd.to_numeric(s2, errors='coerce')).all()
In [213]: res
Out[213]: True
更详细:
In [214]: pd.to_numeric(s2, errors='coerce')
Out[214]:
0 302
1 NaN
2 NaN
3 264
4 365
5 376
dtype: float64
In [215]: s1.isin(pd.to_numeric(s2, errors='coerce'))
Out[215]:
0 True
1 True
2 True
3 True
dtype: bool
注意 pd.to_numeric
适用于 Pandas 版本 >=0.17.0
以前你可以使用 convert_objects
> 使用 convert_numeric=True
Note pd.to_numeric
works with pandas version >=0.17.0
for previous you cound use convert_objects
with convert_numeric=True
编辑
如果您更喜欢 set
的解决方案,您也可以将您的第一个集合转换为 str
,然后将它们与您的代码进行比较:
If you prefer solution with set
you could convert your first set to str
as well and then compare them with your code:
s3 = set(map(str, s1))
In [234]: s3
Out[234]: {'264', '302', '365', '376'}
然后你可以对 s2
使用 issubset
:
Then you could use issubset
for s2
:
In [235]: s3.issubset(s2)
Out[235]: True
或set(s2)
:
In [236]: s3.issubset(set(s2))
Out[236]: True
EDIT2
s1 = pd.Series(['376', '264', '365', '302'])
s4 = pd.Series(['nist1950', 'nist1950', 'water', 'water', '376', '376', '302', '302', '365', '365', '264', '264', '376', '376'])
In [263]: s1.astype(float).isin(pd.to_numeric(s4, errors='coerce')).all()
Out[263]: True
这篇关于检查一个系列是否是 Pandas 中另一个系列的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!