自定义缩进宽度BeautifulSoup。prettify() [英] Custom indent width for BeautifulSoup .prettify()

查看:1423
本文介绍了自定义缩进宽度BeautifulSoup。prettify()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法来定义。prettify()函数的自定义缩进宽度是多少?从我可以从它那里得到的源 -

Is there any way to define custom indent width for .prettify() function? From what I can get from it's source -

def prettify(self, encoding=None, formatter="minimal"):
    if encoding is None:
        return self.decode(True, formatter=formatter)
    else:
        return self.encode(encoding, True, formatter=formatter)

有没有办法指定缩进宽度。我想这是因为该行的德code_contents()功能 -

There is no way to specify indent width. I think it's because of this line in the decode_contents() function -

s.append(" " * (indent_level - 1))

其中有1空间的固定长度! (为什么!)我试图指定 indent_level = 4 ,这只是导致了这一点 -

Which has a fixed length of 1 space! (WHY!!) I tried specifying indent_level=4, that just results in this -

    <section>
     <article>
      <h1>
      </h1>
      <p>
      </p>
     </article>
    </section>

这看上去只是普通的愚蠢。 :|

Which looks just plain stupid. :|

现在,我可以破解这个了,但我只是想确保,如果有什么我失踪。因为这应该是一个基本特征。 : - /

Now, I can hack this away, but I just want to be sure if there is anything I'm missing. Because this should be a basic feature. :-/

如果你有prettifying HTML codeS一些更好的办法,让我知道。

If you have some better way of prettifying HTML codes, let me know.

推荐答案

其实我处理这个自己,以尽可能hackiest方式:通过后期处理的结果。

I actually dealt with this myself, in the hackiest way possible: by post-processing the result.

r = re.compile(r'^(\s*)', re.MULTILINE)
def prettify_2space(s, encoding=None, formatter="minimal"):
    return r.sub(r'\1\1', s.prettify(encoding, formatter))

其实,我在类的地方 prettify 的monkeypatched prettify_2space 。这是对解决方案不是必需的,但让我们做吧,使缩进宽度参数,而不是它硬编码到2:

Actually, I monkeypatched prettify_2space in place of prettify in the class. That's not essential to the solution, but let's do it anyway, and make the indent width a parameter instead of hardcoding it to 2:

orig_prettify = bs4.BeautifulSoup.prettify
r = re.compile(r'^(\s*)', re.MULTILINE)
def prettify(self, encoding=None, formatter="minimal", indent_width=4):
    return r.sub(r'\1' * indent_width, orig_prettify(self, encoding, formatter))
bs4.BeautifulSoup.prettify = prettify

所以:

x = '''<section><article><h1></h1><p></p></article></section>'''
soup = bs4.BeautifulSoup(x)
print(soup.prettify(indent_width=3))

...给出了:

… gives:

<html>
   <body>
      <section>
         <article>
            <h1>
            </h1>
            <p>
            </p>
         </article>
      </section>
   </body>
</html>

显然,如果要修补标签。prettify 以及 BeautifulSoup。prettify ,你必须做同样的事情在那里。 (您可能要创建一个可以适用于,而不是重复自己一个通用包装器。)如果有任何其他 prettify 方法,同样的协议。

Obviously if you want to patch Tag.prettify as well as BeautifulSoup.prettify, you have to do the same thing there. (You might want to create a generic wrapper that you can apply to both, instead of repeating yourself.) And if there are any other prettify methods, same deal.

这篇关于自定义缩进宽度BeautifulSoup。prettify()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆