如何检查Pandas行中是否包含列表的完整字符串或子字符串? [英] How to check if Pandas rows contain any full string or substring of a list?

查看:53
本文介绍了如何检查Pandas行中是否包含列表的完整字符串或子字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串列表

list_ = ['abc', 'def', 'xyz']

我有一个 df CheckCol 列,我想检查 CheckCol 中的值是否包含整个子字符串中的任何一个列表元素.

And I have a df with column CheckCol, that I want to check if the values in CheckCol contains any of the whole of substring of the list element.

如果这样做,我想将原始值提取到新列 NewCol 中.

If it does, I want to extract the original value into a new column NewCol.

CheckCol
'a'
'ab'
'abc'
'abc-de'

进入

# What I want
CheckCol        NewCol
'a'
'ab'
'abc'           'abc'
'abc-de'       'abc-de'

但是,我的以下代码只能识别确切的完整字符串,而不能识别我想要的子字符串.

My following codes, however, only recognize the exact full string, but not the substrings I was looking for.

df['NewCol'] = np.where(df['CheckCol'].isin(list_), df['CheckCol'], '')

并给出

# What I get
CheckCol        NewCol
'a'
'ab'
'abc'           'abc'
'abc-de'       

列表更改为 list _

推荐答案

我认为实现最简单"的解决方案是使用regex表达式.在正则表达式中,管道 | 表示or.通过执行'|'.join(yourlist),我们获得了要检查的子字符串.

I think the "easiest" implemented solution would be to use a regex-expression. In regex the pipe | means or. By doing '|'.join(yourlist) we get the substrings we want to check.

import pandas as pd
import numpy as np

list_ = ['abc', 'def', 'xyz']

df = pd.DataFrame({
    'CheckCol': ['a','ab','abc','abd-def']
})

df['NewCol'] = np.where(df['CheckCol'].str.contains('|'.join(list_)), df['CheckCol'], '')

print(df)

#  CheckCol   NewCol
#0        a         
#1       ab         
#2      abc      abc
#3  abd-def  abd-def


注意::您的变量名称 list 已更改为 list _ .尝试避免使用保留的Python名称空间.


NOTE: Your variable name list was changed to list_. Try to avoid using the reserved Python namespace.

这篇关于如何检查Pandas行中是否包含列表的完整字符串或子字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆