Python-beautifulsoup-如何处理丢失的结束标签 [英] Python - beautifulsoup - how to deal with missing closing tags

查看：13 发布时间：2021/12/23 20:55:57 python beautifulsoup html-table

本文介绍了Python-beautifulsoup-如何处理丢失的结束标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用 beautifulsoup 从 html 代码中抓取表格.html 的一个片段如下所示.使用 table.findAll('tr') 时，我得到了整个表，而不仅仅是行.(可能是因为 html 代码中缺少结束标记?)

I would like to scrape the table from html code using beautifulsoup. A snippet of the html is shown below. When using table.findAll('tr') I get the entire table and not only the rows. (probably because the closing tags are missing from the html code?)

  <TABLE COLS=9 BORDER=0 CELLSPACING=3 CELLPADDING=0>
  <TR><TD><B>Artikelbezeichnung</B>
  <TD><B>Anbieter</B>
  <TD><B>Menge</B>
  <TD><B>Taxe-EK</B>
  <TD><B>Taxe-VK</B>
  <TD><B>Empf.-VK</B>
  <TD><B>FB</B>
  <TD><B>PZN</B>
  <TD><B>Nachfolge</B>

  <TR><TD>ACTIQ 200 Mikrogramm Lutschtabl.m.integr.Appl.
  <TD>Orifarm
  <TD ID=R>     30 St
  <TD ID=R>  266,67
  <TD ID=R>  336,98
  <TD>&nbsp;
  <TD>&nbsp;
  <TD>12516714
  <TD>&nbsp;

  </TABLE>

这是我的 python 代码，用于显示我正在努力解决的问题:

Here is my python code to show what I am struggling with:

     soup = BeautifulSoup(data, "html.parser")
     table = soup.findAll("table")[0]
     rows = table.find_all('tr')
     for tr in rows:
         print(tr.text)

推荐答案

如他们的 documentation html5lib 像 Web 浏览器一样解析文档(如本例中的 lxml).它会在需要时通过添加/关闭标签来尝试修复您的文档树.

As stated in their documentation html5lib parses the document as the web browser does (Like lxmlin this case). It'll try to fix your document tree by adding/closing tags when needed.

在您的示例中，我使用 lxml 作为解析器，它给出了以下结果:

In your example I've used lxml as the parser and it gave the following result:

soup = BeautifulSoup(data, "lxml")
table = soup.findAll("table")[0]
rows = table.find_all('tr')
for tr in rows:
    print(tr.get_text(strip=True))

注意 lxml 添加了 html &body 标签，因为它们不存在于源代码中(它将尝试创建一个格式良好的文档，如先前所述).

Note that lxml added html & body tags because they weren't present in the source (It'll try to create a well formed document as previously state).

这篇关于Python-beautifulsoup-如何处理丢失的结束标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python-beautifulsoup-如何处理丢失的结束标签 [英] Python - beautifulsoup - how to deal with missing closing tags

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python-beautifulsoup-如何处理丢失的结束标签 [英] Python - beautifulsoup - how to deal with missing closing tags

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭