使用Python在HTML标记中查找数据 [英] Find data within HTML tags using Python
本文介绍了使用Python在HTML标记中查找数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我尝试从网站上抓取以下HTML代码:
I have the following HTML code I am trying to scrape from a website:
<td>Net Taxes Due<td>
<td class="value-column">$2,370.00</td>
<td class="value-column">$2,408.00</td>
我要完成的工作是搜索页面,以在标签内找到文本"Net Taxes Due",找到标签的同级并将结果发送到Pandas数据框中.
What I am trying to accomplish is to search the page to find the text "Net Taxes Due" within the tag, find the siblings of the tag, and send the results into a Pandas data frame.
我有以下代码:
soup = BeautifulSoup(url, "html.parser")
table = soup.select('#Net Taxes Due')
cells = table.find_next_siblings('td')
cells = [ele.text.strip() for ele in cells]
df = pd.DataFrame(np.array(cells))
print(df)
我到网上都在寻找解决方案,却想不出什么办法.感谢任何帮助.
I've been all over the web looking for a solution and can't come up with something. Appreciate any help.
谢谢!
推荐答案
请确保添加标签名称以及搜索字符串.这是您可以这样做的方式:
Make sure to add the tag name along with your search string. This is how you can do that:
from bs4 import BeautifulSoup
htmldoc = """
<tr>
<td>Net Taxes Due</td>
<td class="value-column">$2,370.00</td>
<td class="value-column">$2,408.00</td>
</tr>
"""
soup = BeautifulSoup(htmldoc, "html.parser")
item = soup.find('td',text='Net Taxes Due').find_next_sibling("td")
print(item)
这篇关于使用Python在HTML标记中查找数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文