通过标签自定义BeautifulSoup的美化 [英] customize BeautifulSoup's prettify by tag
问题描述
我想知道是否有可能做到这一点,以使prettify
不在特定标签上创建新行.
I was wondering if it would be possible to make it so that prettify
did not create new lines on specific tags.
我想这样做,以使span
和a
标签不会分开,例如:
I would like to make it so that span
and a
tags do not split up, for example:
doc="""<div><div><span>a</span><span>b</span>
<a>link</a></div><a>link1</a><a>link2</a></div>"""
from bs4 import BeautifulSoup as BS
soup = BS(doc)
print soup.prettify()
以下是我要打印的内容:
below is what I want to print:
<div>
<div>
<span>a</span><span>b</span>
<a>link</a>
</div>
<a>link1</a><a>link2</a>
</div>
但这是实际打印的内容:
But this is what will actually print:
<div>
<div>
<span>
a
</span>
<span>
b
</span>
<a>
link
</a>
</div>
<a>
link1
</a>
<a>
link2
</a>
</div>
将内联样式的标签放在这样的新行上实际上会在它们之间添加空间,从而稍微改变实际页面的外观.我将把您链接到两个显示差异的jsfiddles:
Placing inline styled tags on new lines like that actually will add space between them, slightly altering how the actual page looks. I will link you to two jsfiddles displaying the difference:
如果您想知道这对BeautifulSoup为何重要,那是因为我正在编写一个网页调试器,并且prettify函数将非常有用(以及bs4中的其他功能).但是,如果我对文件进行美化处理,则可能会更改某些内容.
If you're wondering why that matters for BeautifulSoup, it is because I am writing a web-page debugger, and the prettify function would be very useful (along with other things in bs4). But if I prettify the document, then I risk altering some things.
那么,有什么方法可以自定义prettify
函数,以便我可以将其设置为不破坏某些标签?
So, is there any way to customize the prettify
function so that I can set it to not break up certain tags?
推荐答案
我发布了一个快速的hack,但没有找到更好的解决方案.
I'm posting a quick hack while I don't find a better solution.
我实际上是在项目中使用它,以避免破坏textareas和pre标签.用要防止缩进的标签替换['span','a'].
I'm actually using it on my project to avoid breaking textareas and pre tags. Replace ['span', 'a'] with the tags on which you want to prevent indentation.
markup = """<div><div><span>a</span><span>b</span>
<a>link</a></div><a>link1</a><a>link2</a></div>"""
# Double curly brackets to avoid problems with .format()
stripped_markup = markup.replace('{','{{').replace('}','}}')
stripped_markup = BeautifulSoup(stripped_markup)
unformatted_tag_list = []
for i, tag in enumerate(stripped_markup.find_all(['span', 'a'])):
unformatted_tag_list.append(str(tag))
tag.replace_with('{' + 'unformatted_tag_list[{0}]'.format(i) + '}')
pretty_markup = stripped_markup.prettify().format(unformatted_tag_list=unformatted_tag_list)
print pretty_markup
这篇关于通过标签自定义BeautifulSoup的美化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!