PANDAS 从列中找到确切的给定字符串/单词 [英] PANDAS find exact given string/word from a column
问题描述
因此,我有一个 Pandas 列名称 Notes,其中包含某个事件的句子或解释.我正在尝试从该列中找到一些给定的单词,当我找到该单词时,我将其添加到下一列中作为 Type
So, I have a pandas column name Notes which contains a sentence or explanation of some event. I am trying find some given words from that column and when I find that word I am adding that to the next column as Type
问题是针对某些特定单词的,例如 Liar、Lies 其选择的单词如 familiar 和 families> 因为他们都有说谎者和谎言.
The problem is for some specific word for example Liar, Lies its picking up word like familiar and families because they both have liar and lies in them.
Notes Type
2 families are living in the address Lies
He is a liar Liar
We are not familiar with this Liar
从上面可以看出,只有第二句话是正确的.我如何只选择像骗子、谎言这样的单独词,而不是家庭或熟悉的词.
As you can see from above only the second sentence is correct. How do I only pick up separate word like liar, lies and not families or familiar.
这是我的方法,
word= ["Lies"]
for i in range(0, len(df)):
for f in word:
if f in df["Notes"][i]:
df["Type"][i] = "Lies"
感谢任何帮助.谢谢
推荐答案
在regex
和.str.extract
中使用\b
作为词边界代码>查找模式:
Use \b
for word boundary in regex
, and .str.extract
to find pattern:
df.Notes.str.extract(r'\b(lies|liar)\b')
要标记包含该单词的行,请执行以下操作:
To label those rows containing that word, do:
df['Type'] = np.where(df.Notes.str.contains(r'\b(lies|liar)\b'), 'Lies', 'Not Lies')
这篇关于PANDAS 从列中找到确切的给定字符串/单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!