在python 3中查找字符串中出现的单词 [英] Finding occurrences of a word in a string in python 3

查看:536
本文介绍了在python 3中查找字符串中出现的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到一个字在字符串中出现的次数。

  word =dog
str1 =狗吠



我使用以下方法来计算出现次数:

  count = str1.count(word)

问题是我想要一个完全匹配。所以这句话的计数为0.
是可能的吗?

解决方案

  import re 
count = sum(1 for _ in re.finditer(r'\% \ b'%re.escape(word),input_string))

创建任何中间列表(不像 split()),因此对于大的 input_string

它也有正确工作标点符号的好处 - 它会正确返回 1 作为短语Mike看到一只狗。(而无参数 split()不会)。它使用 \b regex标志,它匹配字边界( \w aka [a-zA-Z0-9 _] 和任何其他)。



如果您需要担心ASCII字符以外的语言设置,您可能需要调整正则表达式以正确匹配这些语言中的非字符字符,但对于许多应用程序,这将是一个过于复杂,在许多其他情况下,设置正则表达式的Unicode和/或区域设置标志就足够了。 / p>

I'm trying to find the number of occurrences of a word in a string.

word = "dog"
str1 = "the dogs barked"

I used the following to count the occurrences:

count = str1.count(word)

The issue is I want an exact match. So the count for this sentence would be 0. Is that possible?

解决方案

If you're going for efficiency:

import re
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), input_string))

This doesn't need to create any intermediate lists (unlike split()) and thus will work efficiently for large input_string values.

It also has the benefit of working correctly with punctuation - it will properly return 1 as the count for the phrase "Mike saw a dog." (whereas an argumentless split() would not). It uses the \b regex flag, which matches on word boundaries (transitions between \w a.k.a [a-zA-Z0-9_] and anything else).

If you need to worry about languages beyond the ASCII character set, you may need to adjust the regex to properly match non-word characters in those languages, but for many applications this would be an overcomplication, and in many other cases setting the unicode and/or locale flags for the regex would suffice.

这篇关于在python 3中查找字符串中出现的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆