Pandas系列不区分大小写的匹配和值之间的部分匹配 [英] Pandas series case-insensitive matching and partial matching between values
问题描述
我具有以下操作来添加状态,该状态显示一个数据帧列的某个列中的任何字符串存在于另一数据帧的指定列中.看起来像这样:
I have the following operation to add a status showing where any string in a column of one dataframe column is present in a specified column of another dataframe. It looks like this:
df_one['Status'] = np.where(df_one.A.isin(df_two.A), 'Matched','Unmatched')
如果字符串大小写不同,这将不匹配.可以在不区分大小写的情况下执行此操作吗?
This won't match if the string case is different. Is it possible to perform this operation while being case insensitive?
此外,当 df_one.A 中的值以 df_two.A 中的完整字符串结尾时,是否可能返回匹配"?例如df_one.A abcdefghijkl-> df_two.A ijkl ='Matched'
Also, is it possible return 'Matched' when a value in df_one.A ends with the full string from df_two.A? e.g. df_one.A abcdefghijkl -> df_two.A ijkl = 'Matched'
推荐答案
您可以通过将两个字符串都转换为表达式中的小写或大写字母(均可行)来进行第一个测试(因为您无需将任何一列重新分配给您的DataFrames,大小写转换只是暂时的):
You can do the first test by converting both strings to lowercase or uppercase (either works) inside the expression (as you aren't reassigning either column back to your DataFrames, the case conversion is only temporary):
df_one['Status'] = np.where(df_one.A.str.lower().isin(df_two.A.str.lower()), \
'Matched', 'Unmatched')
您可以通过检查df_one.A中的每个字符串是否都以df_two.A中的任何字符串结尾来进行第二次测试,就像这样(假设您仍然需要不区分大小写的匹配项):
You can perform your second test by checking whether each string in df_one.A ends with any of the strings in df_two.A, like so (assuming you still want a case-insensitive match):
df_one['Endswith_Status'] = np.where(df_one.A.str.lower().apply( \
lambda x: any(x.endswith(i) for i in df_two.A.str.lower())), \
'Matched', 'Unmatched')
这篇关于Pandas系列不区分大小写的匹配和值之间的部分匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!