通过标签自定义BeautifulSoup的美化 [英] customize BeautifulSoup's prettify by tag

查看:159
本文介绍了通过标签自定义BeautifulSoup的美化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有可能做到这一点,以使prettify不在特定标签上创建新行.

I was wondering if it would be possible to make it so that prettify did not create new lines on specific tags.

我想这样做,以使spana标签不会分开,例如:

I would like to make it so that span and a tags do not split up, for example:

doc="""<div><div><span>a</span><span>b</span>
<a>link</a></div><a>link1</a><a>link2</a></div>"""

from bs4 import BeautifulSoup as BS
soup = BS(doc)
print soup.prettify()

以下是我要打印的内容:

below is what I want to print:

<div>
    <div>
        <span>a</span><span>b</span>
        <a>link</a>
    </div>
    <a>link1</a><a>link2</a>
</div>

但这是实际打印的内容:

But this is what will actually print:

<div>
    <div>
        <span>
            a
        </span>
        <span>
            b
        </span>
        <a>
            link
        </a>
    </div>
    <a>
        link1
    </a>
    <a>
        link2
    </a>
</div>

将内联样式的标签放在这样的新行上实际上会在它们之间添加空间,从而稍微改变实际页面的外观.我将把您链接到两个显示差异的jsfiddles:

Placing inline styled tags on new lines like that actually will add space between them, slightly altering how the actual page looks. I will link you to two jsfiddles displaying the difference:

如果您想知道这对BeautifulSoup为何重要,那是因为我正在编写一个网页调试器,并且prettify函数将非常有用(以及bs4中的其他功能).但是,如果我对文件进行美化处理,则可能会更改某些内容.

If you're wondering why that matters for BeautifulSoup, it is because I am writing a web-page debugger, and the prettify function would be very useful (along with other things in bs4). But if I prettify the document, then I risk altering some things.

那么,有什么方法可以自定义prettify函数,以便我可以将其设置为不破坏某些标签?

So, is there any way to customize the prettify function so that I can set it to not break up certain tags?

推荐答案

我发布了一个快速的hack,但没有找到更好的解决方案.

I'm posting a quick hack while I don't find a better solution.

我实际上是在项目中使用它,以避免破坏textareas和pre标签.用要防止缩进的标签替换['span','a'].

I'm actually using it on my project to avoid breaking textareas and pre tags. Replace ['span', 'a'] with the tags on which you want to prevent indentation.

markup = """<div><div><span>a</span><span>b</span>
<a>link</a></div><a>link1</a><a>link2</a></div>"""

# Double curly brackets to avoid problems with .format()
stripped_markup = markup.replace('{','{{').replace('}','}}')

stripped_markup = BeautifulSoup(stripped_markup)

unformatted_tag_list = []

for i, tag in enumerate(stripped_markup.find_all(['span', 'a'])):
    unformatted_tag_list.append(str(tag))
    tag.replace_with('{' + 'unformatted_tag_list[{0}]'.format(i) + '}')

pretty_markup = stripped_markup.prettify().format(unformatted_tag_list=unformatted_tag_list)

print pretty_markup

这篇关于通过标签自定义BeautifulSoup的美化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆