删除评论标签,但对BeautifulSoup不满意 [英] Remove comment tag but NOT content with BeautifulSoup

查看:72
本文介绍了删除评论标签,但对BeautifulSoup不满意的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用BeautifulSoup练习一些网络抓取,特别是我正在查看NFL游戏数据,更具体地说是在此页面上的"Team Stats"表(

I'm practicing some web scraping using BeautifulSoup, specifically I'm looking at NFL game data and more specifically the "Team Stats" table on this page (https://www.pro-football-reference.com/boxscores/201809060phi.htm).

当查看表格的HTML时,我会看到类似这样的内容:

When looking at the HTML for the table I see something like this:

<div class="section_heading">...</div>
<div class="placeholder"></div>
<!--
    <div class="table_outer_container">
        <div class="overthrow table_container" id="div_team_stats">
            <table class="stats_table" id="team_stats" data-cols-to-freeze=1>
                ....
            </table>
        </div>
    </div>
-->

本质上,呈现给页面的HTML作为注释存储在HTML中,因此我可以找到表的div,但是BeautifulSoup无法解析表本身,因为它全部在注释中.

Essentially, the HTML that is being rendered to the page is stored in the HTML as a comment, so I can find the div for the table but BeautifulSoup can't parse the table itself because it's all in the comment.

是否有解决此问题的好方法,以便可以使用BeautifulSoup解析表HTML?我想出了如何提取注释文本,但是我不知道是否存在将结果String转换为可用HTML的好方法.另外,评论标签可以简单地删除,我认为可以将其解析为HTML,但是我也没有找到一种很好的方法.

Is there a good way to get around this so I can parse the table HTML with BeautifulSoup? I figured out how to extract the comment text, but I don't know if there's a good way to convert the resulting String into usable HTML. Alternatively the comment tags could simply be removed which I think would let it be parsed as HTML, but I haven't found a good way to do that either.

推荐答案

from bs4 import BeautifulSoup, Comment
for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
    comments.extract()

由此,您将能够取出所有注释,并在注释之间插入文本,并将其放入BS4中以提取其中的数据.希望这行得通.

From this you will be able to get all the comments out and get the text in between comments and put it in the BS4 to extract data within. Hope this works.

这篇关于删除评论标签,但对BeautifulSoup不满意的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆