如何找到评论标签美丽汤? [英] How to find the comment tag  with BeautifulSoup?

查看：36 发布时间：2022/1/18 21:00:19 python html tags beautifulsoup

本文介绍了如何找到评论标签美丽汤?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试了 soup.find('!--') 但它似乎不起作用.提前致谢.

I tried soup.find('!--') but it doesn't seem to work. Thanks in advance.

感谢您提供有关如何查找所有评论的提示.我有一个后续问题.我如何专门搜索评论?

Thanks for the tip on how to find all comments. I have a follow up question. How do I specifically search out for a comment?

例如，我有以下评论标签:

For example, I have the following comment tag:

我真的只是想要这些东西 Wednesday 110518.110518"是我倾向于用作搜索目标的日期 YYMMDD.但是，我不知道如何在特定的评论标签中找到一些东西.

I really just want this stuff Wednesday 110518. The "110518" is the date YYMMDD which I'm leaning on using as my search target. However, I don't know how to find something within a specific comment tag.

推荐答案

Pyparsing 允许您使用内置的 htmlComment 表达式搜索 HTML 注释，并附加解析时回调以验证和提取各种评论中的数据字段:

Pyparsing allows you to search for HTML comments using a builtin htmlComment expression, and attach parse-time callbacks to validate and extract the various data fields within the comment:

from pyparsing import makeHTMLTags, oneOf, withAttribute, Word, nums, Group, htmlComment
import calendar

# have pyparsing define tag start/end expressions for the 
# tags we want to look for inside the comments
span,spanEnd = makeHTMLTags("span")
i,iEnd = makeHTMLTags("i")

# only want spans with class=titlefont
span.addParseAction(withAttribute(**{'class':'titlefont'}))

# define what specifically we are looking for in this comment
weekdayname = oneOf(list(calendar.day_name))
integer = Word(nums)
dateExpr = Group(weekdayname("day") + integer("daynum"))
commentBody = '<!--' + span + i + dateExpr("date") + iEnd

# define a parse action to attach to the standard htmlComment expression,
# to extract only what we want (or raise a ParseException in case 
# this is not one of the comments we're looking for)
def grabCommentContents(tokens):
    return commentBody.parseString(tokens[0])
htmlComment.addParseAction(grabCommentContents)


# let's try it
htmlsource = """
want to match this one
<!-- <span class="titlefont"> <i>Wednesday 110518</i>(05:00PM)<br /></span> -->

don't want the next one, wrong span class
<!-- <span class="bodyfont"> <i>Wednesday 110519</i>(05:00PM)<br /></span> -->

not even a span tag!
<!-- some other text with a date in italics <i>Wednesday 110520</i>(05:00PM)<br /></span> -->

another matching comment, on a different day
<!-- <span class="titlefont"> <i>Thursday 110521</i>(05:00PM)<br /></span> -->
"""

for comment in htmlComment.searchString(htmlsource):
    parsedDate = comment.date
    # date info can be accessed like elements in a list
    print parsedDate[0], parsedDate[1]
    # because we named the expressions within the dateExpr Group
    # we can also get at them by name (this is much more robust, and 
    # easier to maintain/update later)
    print parsedDate.day
    print parsedDate.daynum
    print

打印:

Wednesday 110518
Wednesday
110518

Thursday 110521
Thursday
110521

这篇关于如何找到评论标签美丽汤?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何找到评论标签美丽汤? [英] How to find the comment tag  with BeautifulSoup?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何找到评论标签&lt;!--...--&gt;美丽汤? [英] How to find the comment tag &lt;!--...--&gt; with BeautifulSoup?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

如何找到评论标签美丽汤? [英] How to find the comment tag  with BeautifulSoup?

登录关闭