基于部分属性值的美丽汤查找标签 [英] Beautiful Soup Find Tags based on partial attribute value

查看:68
本文介绍了基于部分属性值的美丽汤查找标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据部分属性值识别html文档中的标签.

I am trying to identify tags in an html document based on part of the attribute value.

例如,如果我有一个Beautifulsoup对象:

For example, if I have a Beautifulsoup object:

import bs4 as BeautifulSoup

r = requests.get("http:/My_Page")

soup = BeautifulSoup(r.text, "html.parser")

我想要具有id属性的tr标签,其值的格式如下:"news_4343_23255_xxx".我对任何tr标记感兴趣,只要它具有作为id属性值的前4个字符即​​可.

I want tr tags with id attribute whose values are formatted like this: "news_4343_23255_xxx". I'm interested in any tr tag as long as it has "news" as the first 4 characters of the id attribute value.

我知道我可以进行以下搜索:

I know I can search as follows:

trs = soup.find_all("tr",attrs={"id":True})

这给了我所有具有id属性的tr标签.

which gives me all tr tages with an id attribute.

如何根据子字符串搜索?

How do I seach based on a substring?

推荐答案

使用 regex 获取以id开头的tr,以"news"

Use regex to get tr with id starting with "news"

例如:

from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(html,  "html.parser")
for i in soup.find_all("tr", {'id': re.compile(r'^news')}):
    print(i)

这篇关于基于部分属性值的美丽汤查找标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆