BeautifulSoup replaceWith()方法添加转义的html，希望它不转义 [英] BeautifulSoup replaceWith() method adding escaped html, want it unescaped

查看：81 发布时间：2020/9/20 7:41:13 python django beautifulsoup

本文介绍了BeautifulSoup replaceWith()方法添加转义的html，希望它不转义的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个python方法(感谢此代码段)，该方法需要一些html并将<a>仅使用BeautifulSoup和Django的urlize围绕未格式化链接的标签:

I have a python method (thank to this snippet) that takes some html and wraps <a> tags around ONLY unformatted links, using BeautifulSoup and Django's urlize:

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    print(soup)

    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if textNode.parent and getattr(textNode.parent, 'name') == 'a':
            continue  # skip already formatted links
        urlizedText = urlize(textNode)
        textNode.replaceWith(urlizedText)

    print(soup)

    return str(soup)

示例输入文本(作为第一个打印语句的输出)是

Sample input text (as output by the first print statement) is this:

this is a formatted link <a href="http://google.ca">http://google.ca</a>, this one is unformatted and should become formatted: http://google.ca

返回的文本(由第二个print语句输出)是这样的:

The resulting return text (as output by the second print statement) is this:

this is a formatted link <a href="http://google.ca">http://google.ca</a>, this one is unformatted and should become formatted: &lt;a href="http://google.ca"&gt;http://google.ca&lt;/a&gt;

如您所见，它正在格式化链接，但它使用的是转义的html，因此，当我在模板{{ my.html|safe }}中打印它时，它不会呈现为html.

As you can see, it is formatting the link, but it's doing it with escaped html, so when I print it in a template {{ my.html|safe }} it doesn't render as html.

那么我如何才能使这些与urlize添加在一起的标记不被转义，并正确地呈现为html?我怀疑这与我将其用作方法而不是模板过滤器有关吗?我实际上找不到关于此方法的文档，它没有出现在 django.utils.html .

So how can I get these tags that are added with urlize to be unescaped, and render properly as html? I suspect this has something do do with me using it as a method instead of a template filter? I can't actually find the docs on this method, it doesn't appear in django.utils.html.

看来转义实际上发生在此行:textNode.replaceWith(urlizedText).

It appears the escaping actually happen in this line: textNode.replaceWith(urlizedText).

推荐答案

您可以将urlizedText字符串转到一个新的BeautifulSoup对象中，它本身就是标签，而不是其中的文本(已转义)如您所愿)

You can turn your urlizedText string in to a new BeautifulSoup object and it will be treated as a tag in it's own right rather than text within one (which is escaped as you'd expect)

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    print(soup)

    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if textNode.parent and getattr(textNode.parent, 'name') == 'a':
            continue  # skip already formatted links
        urlizedText = urlize(textNode)
        textNode.replaceWith(BeautifulSoup(urlizedText, "html.parser"))

    print(soup)

    return str(soup)

这篇关于BeautifulSoup replaceWith()方法添加转义的html，希望它不转义的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BeautifulSoup replaceWith()方法添加转义的html，希望它不转义 [英] BeautifulSoup replaceWith() method adding escaped html, want it unescaped

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

BeautifulSoup replaceWith()方法添加转义的html，希望它不转义 [英] BeautifulSoup replaceWith() method adding escaped html, want it unescaped

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭