用于 HTML 解析的 Python 正则表达式 (BeautifulSoup) [英] Python regular expression for HTML parsing (BeautifulSoup)

查看:50
本文介绍了用于 HTML 解析的 Python 正则表达式 (BeautifulSoup)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取 HTML 中隐藏输入字段的值.

I want to grab the value of a hidden input field in HTML.

<input type="hidden" name="fooId" value="12-3456789-1111111111" />

我想用 Python 编写一个正则表达式来返回 fooId 的值,因为我知道 HTML 中的行遵循格式

I want to write a regular expression in Python that will return the value of fooId, given that I know the line in the HTML follows the format

<input type="hidden" name="fooId" value="**[id is here]**" />

有人可以用 Python 提供一个示例来解析 HTML 的值吗?

Can someone provide an example in Python to parse the HTML for the value?

推荐答案

对于这种特殊情况,BeautifulSoup 比 regex 更难编写,但它更健壮……我只是贡献了 BeautifulSoup 示例,鉴于您已经知道要使用哪个正则表达式 :-)

For this particular case, BeautifulSoup is harder to write than a regex, but it is much more robust... I'm just contributing with the BeautifulSoup example, given that you already know which regexp to use :-)

from BeautifulSoup import BeautifulSoup

#Or retrieve it from the web, etc. 
html_data = open('/yourwebsite/page.html','r').read()

#Create the soup object from the HTML data
soup = BeautifulSoup(html_data)
fooId = soup.find('input',name='fooId',type='hidden') #Find the proper tag
value = fooId.attrs[2][1] #The value of the third attribute of the desired tag 
                          #or index it directly via fooId['value']

这篇关于用于 HTML 解析的 Python 正则表达式 (BeautifulSoup)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆