BeautifulSoup .prettify() 的自定义缩进宽度 [英] Custom indent width for BeautifulSoup .prettify()

查看:22
本文介绍了BeautifulSoup .prettify() 的自定义缩进宽度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法为 .prettify() 函数定义自定义缩进宽度?从我可以从它的来源中获得 -

Is there any way to define custom indent width for .prettify() function? From what I can get from it's source -

def prettify(self, encoding=None, formatter="minimal"):
    if encoding is None:
        return self.decode(True, formatter=formatter)
    else:
        return self.encode(encoding, True, formatter=formatter)

无法指定缩进宽度.我认为这是因为 decode_contents() 函数中的这一行 -

There is no way to specify indent width. I think it's because of this line in the decode_contents() function -

s.append(" " * (indent_level - 1))

固定长度为1个空格!(为什么!!)我尝试指定 indent_level=4,结果是这样 -

Which has a fixed length of 1 space! (WHY!!) I tried specifying indent_level=4, that just results in this -

    <section>
     <article>
      <h1>
      </h1>
      <p>
      </p>
     </article>
    </section>

这看起来很愚蠢.:|

现在,我可以破解它,但我只是想确定是否有我遗漏的东西.因为这应该是一个基本功能.:-/

Now, I can hack this away, but I just want to be sure if there is anything I'm missing. Because this should be a basic feature. :-/

如果你有更好的方法来美化 HTML 代码,请告诉我.

If you have some better way of prettifying HTML codes, let me know.

推荐答案

实际上我自己处理了这个问题,可能是最黑客的方式:对结果进行后处理.

I actually dealt with this myself, in the hackiest way possible: by post-processing the result.

r = re.compile(r'^(s*)', re.MULTILINE)
def prettify_2space(s, encoding=None, formatter="minimal"):
    return r.sub(r'11', s.prettify(encoding, formatter))

实际上,我在班级中用猴子补丁prettify_2space 代替了prettify.这对解决方案来说不是必不可少的,但无论如何让我们这样做,并将缩进宽度作为参数而不是将其硬编码为 2:

Actually, I monkeypatched prettify_2space in place of prettify in the class. That's not essential to the solution, but let's do it anyway, and make the indent width a parameter instead of hardcoding it to 2:

orig_prettify = bs4.BeautifulSoup.prettify
r = re.compile(r'^(s*)', re.MULTILINE)
def prettify(self, encoding=None, formatter="minimal", indent_width=4):
    return r.sub(r'1' * indent_width, orig_prettify(self, encoding, formatter))
bs4.BeautifulSoup.prettify = prettify

所以:

x = '''<section><article><h1></h1><p></p></article></section>'''
soup = bs4.BeautifulSoup(x)
print(soup.prettify(indent_width=3))

...给出:

<html>
   <body>
      <section>
         <article>
            <h1>
            </h1>
            <p>
            </p>
         </article>
      </section>
   </body>
</html>

显然,如果你想修补 Tag.prettifyBeautifulSoup.prettify,你必须在那里做同样的事情.(您可能想要创建一个可以应用于两者的通用包装器,而不是重复自己.)如果有任何其他 prettify 方法,同样的处理.

Obviously if you want to patch Tag.prettify as well as BeautifulSoup.prettify, you have to do the same thing there. (You might want to create a generic wrapper that you can apply to both, instead of repeating yourself.) And if there are any other prettify methods, same deal.

这篇关于BeautifulSoup .prettify() 的自定义缩进宽度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆