用于 HTML 解析的 Python 正则表达式 (BeautifulSoup) [英] Python regular expression for HTML parsing (BeautifulSoup)
问题描述
我想获取 HTML 中隐藏输入字段的值.
I want to grab the value of a hidden input field in HTML.
<input type="hidden" name="fooId" value="12-3456789-1111111111" />
我想用 Python 编写一个正则表达式来返回 fooId
的值,因为我知道 HTML 中的行遵循格式
I want to write a regular expression in Python that will return the value of fooId
, given that I know the line in the HTML follows the format
<input type="hidden" name="fooId" value="**[id is here]**" />
有人可以用 Python 提供一个示例来解析 HTML 的值吗?
Can someone provide an example in Python to parse the HTML for the value?
推荐答案
对于这种特殊情况,BeautifulSoup 比 regex 更难编写,但它更健壮……我只是贡献了 BeautifulSoup 示例,鉴于您已经知道要使用哪个正则表达式 :-)
For this particular case, BeautifulSoup is harder to write than a regex, but it is much more robust... I'm just contributing with the BeautifulSoup example, given that you already know which regexp to use :-)
from BeautifulSoup import BeautifulSoup
#Or retrieve it from the web, etc.
html_data = open('/yourwebsite/page.html','r').read()
#Create the soup object from the HTML data
soup = BeautifulSoup(html_data)
fooId = soup.find('input',name='fooId',type='hidden') #Find the proper tag
value = fooId.attrs[2][1] #The value of the third attribute of the desired tag
#or index it directly via fooId['value']
这篇关于用于 HTML 解析的 Python 正则表达式 (BeautifulSoup)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!