BeautifulSoupinnerhtml? [英] BeautifulSoup innerhtml?
问题描述
假设我有一个带有 div
的页面.我可以使用 soup.find()
轻松获得那个 div.
Let's say I have a page with a div
. I can easily get that div with soup.find()
.
现在我有了结果,我想打印那个 div
的整个 innerhtml
:我的意思是,我需要一个包含所有 html 的字符串标签和文本放在一起,就像我用 obj.innerHTML
在 javascript 中得到的字符串一样.这可能吗?
Now that I have the result, I'd like to print the WHOLE innerhtml
of that div
: I mean, I'd need a string with ALL the html tags and text all toegether, exactly like the string I'd get in javascript with obj.innerHTML
. Is this possible?
推荐答案
TL;DR
在 BeautifulSoup 4 中,如果您需要 UTF-8 编码的字节串,请使用 element.encode_contents()
;如果您需要 Python Unicode 字符串,请使用 element.decode_contents()
.例如,DOM 的innerHTML 方法 可能如下所示:
TL;DR
With BeautifulSoup 4 use element.encode_contents()
if you want a UTF-8 encoded bytestring or use element.decode_contents()
if you want a Python Unicode string. For example the DOM's innerHTML method might look something like this:
def innerHTML(element):
"""Returns the inner HTML of an element as a UTF-8 encoded bytestring"""
return element.encode_contents()
<小时>
这些函数目前不在在线文档中,所以我将引用当前的函数定义和代码中的文档字符串.
These functions aren't currently in the online documentation so I'll quote the current function definitions and the doc string from the code.
def encode_contents(
self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING,
formatter="minimal"):
"""Renders the contents of this tag as a bytestring.
:param indent_level: Each line of the rendering will be
indented this many spaces.
:param encoding: The bytestring will be in this encoding.
:param formatter: The output formatter responsible for converting
entities to Unicode characters.
"""
另请参阅有关格式化程序的文档;您很可能会使用 formatter="minimal"
(默认值)或 formatter="html"
(对于 html 实体),除非您想以某种方式手动处理文本.
See also the documentation on formatters; you'll most likely either use formatter="minimal"
(the default) or formatter="html"
(for html entities) unless you want to manually process the text in some way.
encode_contents
返回一个编码的字节串.如果您需要 Python Unicode 字符串,请改用 decode_contents
.
encode_contents
returns an encoded bytestring. If you want a Python Unicode string then use decode_contents
instead.
decode_contents
与 encode_contents
做同样的事情,但返回 Python Unicode 字符串而不是编码的字节字符串.
decode_contents
does the same thing as encode_contents
but returns a Python Unicode string instead of an encoded bytestring.
def decode_contents(self, indent_level=None,
eventual_encoding=DEFAULT_OUTPUT_ENCODING,
formatter="minimal"):
"""Renders the contents of this tag as a Unicode string.
:param indent_level: Each line of the rendering will be
indented this many spaces.
:param eventual_encoding: The tag is destined to be
encoded into this encoding. This method is _not_
responsible for performing that encoding. This information
is passed in so that it can be substituted in if the
document contains a <META> tag that mentions the document's
encoding.
:param formatter: The output formatter responsible for converting
entities to Unicode characters.
"""
<小时>
美汤3
BeautifulSoup 3 没有上述功能,而是有 renderContents
def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING,
prettyPrint=False, indentLevel=0):
"""Renders the contents of this tag as a string in the given
encoding. If encoding is None, returns a Unicode string.."""
这个功能被重新添加到 BeautifulSoup 4 (在 4.0.4) 与 BS3 兼容.
This function was added back to BeautifulSoup 4 (in 4.0.4) for compatibility with BS3.
这篇关于BeautifulSoupinnerhtml?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!