Pandas系列不区分大小写的匹配和值之间的部分匹配 [英] Pandas series case-insensitive matching and partial matching between values

查看:1114
本文介绍了Pandas系列不区分大小写的匹配和值之间的部分匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我具有以下操作来添加状态,该状态显示一个数据帧列的某个列中的任何字符串存在于另一数据帧的指定列中.看起来像这样:

I have the following operation to add a status showing where any string in a column of one dataframe column is present in a specified column of another dataframe. It looks like this:

df_one['Status'] = np.where(df_one.A.isin(df_two.A), 'Matched','Unmatched')

如果字符串大小写不同,这将不匹配.可以在不区分大小写的情况下执行此操作吗?

This won't match if the string case is different. Is it possible to perform this operation while being case insensitive?

此外,当 df_one.A 中的值以 df_two.A 中的完整字符串结尾时,是否可能返回匹配"?例如df_one.A abcdefghijkl-> df_two.A ijkl ='Matched'

Also, is it possible return 'Matched' when a value in df_one.A ends with the full string from df_two.A? e.g. df_one.A abcdefghijkl -> df_two.A ijkl = 'Matched'

推荐答案

您可以通过将两个字符串都转换为表达式中的小写或大写字母(均可行)来进行第一个测试(因为您无需将任何一列重新分配给您的DataFrames,大小写转换只是暂时的):

You can do the first test by converting both strings to lowercase or uppercase (either works) inside the expression (as you aren't reassigning either column back to your DataFrames, the case conversion is only temporary):

df_one['Status'] = np.where(df_one.A.str.lower().isin(df_two.A.str.lower()), \ 
                            'Matched', 'Unmatched')

您可以通过检查df_one.A中的每个字符串是否都以df_two.A中的任何字符串结尾来进行第二次测试,就像这样(假设您仍然需要不区分大小写的匹配项):

You can perform your second test by checking whether each string in df_one.A ends with any of the strings in df_two.A, like so (assuming you still want a case-insensitive match):

df_one['Endswith_Status'] = np.where(df_one.A.str.lower().apply( \
                                      lambda x: any(x.endswith(i) for i in df_two.A.str.lower())), \ 
                                      'Matched', 'Unmatched')

这篇关于Pandas系列不区分大小写的匹配和值之间的部分匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆