替换自定义“HTML"Python 字符串中的标记 [英] Replacing a custom "HTML" tag in a Python string

查看:63
本文介绍了替换自定义“HTML"Python 字符串中的标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够在字符串中包含自定义的HTML"标签,例如:"This is a <photo id="4"/> string".>

在这种情况下,自定义标签是 .如果它更容易,我也可以改变这个自定义标签以不同的方式编写,即 [photo id:4] 或其他东西.

我希望能够将此字符串传递给将提取标记 <photo id="4"/> 的函数,并允许我将其转换为一些更复杂的模板像 <div class="photo"><img src="...." alt="..."></div>,然后我可以使用它替换原始字符串中的标签.

我想象它的工作方式是这样的:

<预><代码>>>>content = "这是一个 <photo id="4"/> 字符串"# 将字符串传递给一个函数,该函数返回具有给定名称的所有标签.>>>标签 = parse_tags('照片', 字符串)>>>打印(标签)[{'tag': 'photo', 'id': 4, 'raw': '<photo id="4"/>'}]# 现在我知道我需要渲染一张 ID 为 4 的照片,所以我可以将它传递给某种模板>>>渲染 = render_photo(id=tags[0]['id'])>>>打印(渲染)<div class="photo"><img src="...." alt="..."></div>>>>content = content.replace(tags[0]['raw'], 渲染)>>>打印(内容)这是一个<div class="photo"><img src="...." alt="..."></div>细绳

我认为这是一个相当普遍的模式,比如在博客文章中放一张照片,所以我想知道是否有一个库可以做类似于示例 parse_tags 以上功能.还是需要我写?

此照片标签示例只是一个示例.我想要不同名称的标签.作为一个不同的例子,也许我有一个人的数据库,我想要一个像 <person name="John Doe"/> 这样的标签.在这种情况下,我想要的输出类似于 {'tag': 'person', 'name': 'John Doe', 'raw': '<person name="John Doe"/>'}.然后,我可以使用该名称查找该人并返回该人的 vcard 或其他内容的渲染模板.

解决方案

对于这种手术式"解析(您希望隔离特定标签而不是创建完整的分层文档),pyparsing 的 makeHTMLTags 方法非常有用.

请参阅下面带注释的脚本,显示解析器的创建,并将其用于 parseTagreplaceTag 方法:

导入pyparsing为ppdef make_tag_parser(tag):# makeHTMLTags 返回 2 个解析器,一个用于开始标签,一个用于# 结束标签 - 我们只需要开始标签;解析器将返回已解析#标签本身的字段tag_parser = pp.makeHTMLTags(tag)[0]# 而不是返回标签的解析位,使用 originalTextFor 来# 将原始标签作为 token[0] 返回(指定 asString=False 将保留# 解析的属性和标签名称作为属性)解析器 = pp.originalTextFor(tag_parser, asString=False)# 再添加一个回调来定义从 t[0] 复制的 'raw' 属性def add_raw_attr(t):t['原始'] =​​ t[0]parser.addParseAction(add_raw_attr)返回解析器# parseTag 查找所有匹配项并报告其属性def parseTag(tag, s):返回 make_tag_parser(tag).searchString(s)content = """这是一个 <photo id="4"/> 字符串"""tag_matches = parseTag("照片", 内容)对于 tag_matches 中的匹配:打印(匹配.转储())打印(原始:{!r}".格式(匹配.原始))打印(标签:{!r}".格式(匹配.标签))print("id: {!r}".format(match.id))# 变换标签以执行标签-> div 变换def replaceTag(tag, transform, s):解析器 = make_tag_parser(tag)# 再添加一个解析动作来进行转换parser.addParseAction(lambda t: transform.format(**t))返回 parser.transformString(s)打印(替换标签(照片",'<div class="{tag}"><img src="<src_path>/img_{id}.jpg."alt="{tag}_{id}"></div>',内容))

打印:

['<photo id="4"/>']- 空:真- ID:'4'- 原始:'<photo id="4"/>'- startPhoto: ['photo', ['id', '4'], True][0]:照片[1]:['id', '4'][2]:真的- 标记照片'raw: '<photo id="4"/>'标记照片'编号:'4'这是一个<div class="photo"><img src="<src_path>/img_4.jpg."alt="photo_4"></div>细绳

I want to be able to include a custom "HTML" tag in a string, such as: "This is a <photo id="4" /> string".

In this case the custom tag is <photo id="4" />. I would also be fine changing this custom tag to be written differently if it makes it easier, ie [photo id:4] or something.

I want to be able to pass this string to a function that will extract the tag <photo id="4" />, and allow me to transform this to some more complicated template like <div class="photo"><img src="...." alt="..."></div>, which I can then use to replace the tag in the original string.

I'm imaging it work something like this:

>>> content = "This is a <photo id="4" /> string"
# Pass the string to a function that returns all the tags with the given name.
>>> tags = parse_tags('photo', string)
>>> print(tags)
[{'tag': 'photo', 'id': 4, 'raw': '<photo id="4" />'}]
# Now that I know I need to render a photo with ID 4, so I can pass that to some sort of template thing
>>> rendered = render_photo(id=tags[0]['id'])
>>> print(rendered)
<div class="photo"><img src="...." alt="..."></div>
>>> content = content.replace(tags[0]['raw'], rendered)
>>> print(content)
This is a <div class="photo"><img src="...." alt="..."></div> string

I think this is a fairly common pattern, for something like putting a photo in a blog post, so I'm wondering if there is a library out there that will do something similar to the example parse_tags function above. Or do I need to write it?

This example of the photo tag is just a single example. I would want to have tags with different names. As a different example, maybe I have a database of people and I want a tag like <person name="John Doe" />. In that case the output I want is something like {'tag': 'person', 'name': 'John Doe', 'raw': '<person name="John Doe" />'}. I can then use the name to look that person up and return a rendered template of the person's vcard or something.

解决方案

For this kind of "surgical" parsing (where you want to isolate specific tags instead of creating a full hierarchical document), pyparsing's makeHTMLTags method can be very useful.

See the annotated script below, showing the creation of the parser, and using it for parseTag and replaceTag methods:

import pyparsing as pp

def make_tag_parser(tag):
    # makeHTMLTags returns 2 parsers, one for the opening tag and one for the
    # closing tag - we only need the opening tag; the parser will return parsed
    # fields of the tag itself
    tag_parser = pp.makeHTMLTags(tag)[0]

    # instead of returning parsed bits of the tag, use originalTextFor to
    # return the raw tag as token[0] (specifying asString=False will retain
    # the parsed attributes and tag name as attributes)
    parser = pp.originalTextFor(tag_parser, asString=False)

    # add one more callback to define the 'raw' attribute, copied from t[0]
    def add_raw_attr(t):
        t['raw'] = t[0]
    parser.addParseAction(add_raw_attr)

    return parser

# parseTag to find all the matches and report their attributes
def parseTag(tag, s):
    return make_tag_parser(tag).searchString(s)


content = """This is a <photo id="4" /> string"""

tag_matches = parseTag("photo", content)
for match in tag_matches:
    print(match.dump())
    print("raw: {!r}".format(match.raw))
    print("tag: {!r}".format(match.tag))
    print("id:  {!r}".format(match.id))


# transform tag to perform tag->div transforms
def replaceTag(tag, transform, s):
    parser = make_tag_parser(tag)

    # add one more parse action to do transform
    parser.addParseAction(lambda t: transform.format(**t))
    return parser.transformString(s)

print(replaceTag("photo", 
                   '<div class="{tag}"><img src="<src_path>/img_{id}.jpg." alt="{tag}_{id}"></div>', 
                   content))

Prints:

['<photo id="4" />']
- empty: True
- id: '4'
- raw: '<photo id="4" />'
- startPhoto: ['photo', ['id', '4'], True]
  [0]:
    photo
  [1]:
    ['id', '4']
  [2]:
    True
- tag: 'photo'
raw: '<photo id="4" />'
tag: 'photo'
id:  '4'
This is a <div class="photo"><img src="<src_path>/img_4.jpg." alt="photo_4"></div> string

这篇关于替换自定义“HTML"Python 字符串中的标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆