使用Python在HTML标记中查找数据 [英] Find data within HTML tags using Python

查看:44
本文介绍了使用Python在HTML标记中查找数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试从网站上抓取以下HTML代码:

I have the following HTML code I am trying to scrape from a website:

<td>Net Taxes Due<td>
<td class="value-column">$2,370.00</td>
<td class="value-column">$2,408.00</td>

我要完成的工作是搜索页面,以在标签内找到文本"Net Taxes Due",找到标签的同级并将结果发送到Pandas数据框中.

What I am trying to accomplish is to search the page to find the text "Net Taxes Due" within the tag, find the siblings of the tag, and send the results into a Pandas data frame.

我有以下代码:

soup = BeautifulSoup(url, "html.parser")
table = soup.select('#Net Taxes Due')

cells = table.find_next_siblings('td')
cells = [ele.text.strip() for ele in cells]

df = pd.DataFrame(np.array(cells))

print(df)

我到网上都在寻找解决方案,却想不出什么办法.感谢任何帮助.

I've been all over the web looking for a solution and can't come up with something. Appreciate any help.

谢谢!

推荐答案

请确保添加标签名称以及搜索字符串.这是您可以这样做的方式:

Make sure to add the tag name along with your search string. This is how you can do that:

from bs4 import BeautifulSoup

htmldoc = """
<tr>
    <td>Net Taxes Due</td>
    <td class="value-column">$2,370.00</td>
    <td class="value-column">$2,408.00</td>
</tr>
"""    
soup = BeautifulSoup(htmldoc, "html.parser")
item = soup.find('td',text='Net Taxes Due').find_next_sibling("td")
print(item)

这篇关于使用Python在HTML标记中查找数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆