如何使用Beautiful Soup查找具有更改ID的标签? [英] How to use Beautiful Soup to find a tag with changing id?

查看:74
本文介绍了如何使用Beautiful Soup查找具有更改ID的标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Python中使用Beautiful Soup.

I am using Beautiful Soup in Python.

以下是示例网址:

http://www.locationary.com/place/en/US/Ohio/Middletown/McDonald%27s-p1013254580.jsp

在HTML中,有一堆标签,而我可以指定要查找的唯一方法是使用其ID.我唯一想找到的就是电话号码.标签看起来像这样:

In the HTML, there are a bunch of tags and the only way I can specify which ones to find is with their id. The only thing I want to find is the telephone number. The tag looks like this:

<td class="dispTxt" id="value_xxx_c_1_f_8_a_134242498">5134231582</td> 

我访问了同一网站上的其他URL,每次都找到几乎相同的电话号码标签ID.始终保持不变的部分是:

I have gone to other URLs on the same website and found almost the same id for the telephone number tag every time. The part that always stays the same is:

'value_xxx_c_1_f_8_a_'

但是,之后的数字总是变化的.有没有办法让我告诉Beautiful Soup查找id的一部分并进行匹配,让另一部分像正则表达式一样成为数字?

However, the numbers that come after that always change. Is there a way that I can tell Beautiful Soup to look for part of the id and match it and let the other part be numbers like a regular expression could?

此外,一旦获得标签,我就在想...如何不使用正则表达式来提取电话号码?我不知道Beautiful Soup是否可以做到这一点,但它可能比正则表达式更简单.

Also, once I get the tag, I was wondering...how can I extract the phone number without using regular expressions? I don't know if Beautiful Soup can do that but it would probably be simpler than regex.

推荐答案

您可以使用正则表达式(此示例与标记名称匹配,您需要对其进行调整以使其与元素的ID匹配):

You can use regular expressions (this example matches on the tag names, you need to adjust it so it matches on an element's id):

import re
for tag in soup.find_all(re.compile("^value_xxx_c_1_f_8_a_")):
    print(tag.name)

这篇关于如何使用Beautiful Soup查找具有更改ID的标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆