PANDAS 从列中找到确切的给定字符串/单词 [英] PANDAS find exact given string/word from a column

查看:25
本文介绍了PANDAS 从列中找到确切的给定字符串/单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我有一个 Pandas 列名称 Notes,其中包含某个事件的句子或解释.我正在尝试从该列中找到一些给定的单词,当我找到该单词时,我将其添加到下一列中作为 Type

So, I have a pandas column name Notes which contains a sentence or explanation of some event. I am trying find some given words from that column and when I find that word I am adding that to the next column as Type

问题是针对某些特定单词的,例如 LiarLies 其选择的单词如 familiarfamilies> 因为他们都有说谎者和谎言.

The problem is for some specific word for example Liar, Lies its picking up word like familiar and families because they both have liar and lies in them.

Notes                                  Type
2 families are living in the address   Lies
He is a liar                           Liar
We are not familiar with this          Liar

从上面可以看出,只有第二句话是正确的.我如何只选择像骗子、谎言这样的单独词,而不是家庭或熟悉的词.

As you can see from above only the second sentence is correct. How do I only pick up separate word like liar, lies and not families or familiar.

这是我的方法,

word= ["Lies"]

for i in range(0, len(df)):
    for f in word:
        if f in df["Notes"][i]:
            df["Type"][i] = "Lies"

感谢任何帮助.谢谢

推荐答案

regex.str.extract中使用\b作为词边界代码>查找模式:

Use \b for word boundary in regex, and .str.extract to find pattern:

 df.Notes.str.extract(r'\b(lies|liar)\b')

要标记包含该单词的行,请执行以下操作:

To label those rows containing that word, do:

df['Type'] = np.where(df.Notes.str.contains(r'\b(lies|liar)\b'), 'Lies', 'Not Lies')

这篇关于PANDAS 从列中找到确切的给定字符串/单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆