使用 BeautifulSoup 在 HTML 中搜索字符串 [英] Using BeautifulSoup to search HTML for string
问题描述
我正在使用 BeautifulSoup 在特定页面上查找用户输入的字符串.例如,我想查看字符串 'Python' 是否位于页面上:http://python.org
当我使用:find_string = soup.body.findAll(text='Python')
,find_string
返回 []
但是当我使用:find_string = soup.body.findAll(text=re.compile('Python'), limit=1)
,find_string
按预期返回 [u'Python Jobs']
当要搜索的单词的实例不止一个时,这两个语句之间的区别是什么使第二个语句起作用?
以下行正在寻找 exact NavigableString 'Python':
<预><代码>>>>汤.body.findAll(text='Python')[]请注意,找到了以下 NavigableString:
<预><代码>>>>汤.body.findAll(text='Python 工作')[u'Python 工作']注意这种行为:
<预><代码>>>>进口重新>>>汤.body.findAll(text=re.compile('^Python$'))[]因此,您的正则表达式正在寻找与 NavigableString 'Python' 不完全匹配的 'Python'.
I am using BeautifulSoup to look for user-entered strings on a specific page. For example, I want to see if the string 'Python' is located on the page: http://python.org
When I used:
find_string = soup.body.findAll(text='Python')
,
find_string
returned []
But when I used:
find_string = soup.body.findAll(text=re.compile('Python'), limit=1)
,
find_string
returned [u'Python Jobs']
as expected
What is the difference between these two statements that makes the second statement work when there are more than one instances of the word to be searched?
The following line is looking for the exact NavigableString 'Python':
>>> soup.body.findAll(text='Python')
[]
Note that the following NavigableString is found:
>>> soup.body.findAll(text='Python Jobs')
[u'Python Jobs']
Note this behaviour:
>>> import re
>>> soup.body.findAll(text=re.compile('^Python$'))
[]
So your regexp is looking for an occurrence of 'Python' not the exact match to the NavigableString 'Python'.
这篇关于使用 BeautifulSoup 在 HTML 中搜索字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!