如何在 BeautifulSoup.contents 中保留空格 [英] How do I keep whitespace in BeautifulSoup.contents

查看:22
本文介绍了如何在 BeautifulSoup.contents 中保留空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在网上找到的大多数示例都展示了如何删除空格 - 但就我而言,我需要保留它..我有

html = "我可以用一只手翻转这整个东西
 <span>D#m</span>
头目
<span>A#</span> <span>Dm<;/span> <span>A#</span>
我知道~~~~事实上,你宁愿拥有一些我来代替"bs = BeautifulSoup(html, 'html.parser')content = (unicode('').join(unicode(content) for content in bs.contents))

我希望保留空格(html"变量包含 pre 标签的内容)——但它似乎用一个空格替换了多个空格.

如何保存/获取给定的 beautifulsoup 解析器的原始内容?

解决方案

html 解析器似乎只在您解析的内容位于 <pre> 标签中时才保留空格——在我的例子中,pre 标签已被删除.添加

html = "

"+ html + "</pre>"

保留了空格.

Most examples I find online show how to remove whitespace - but in my case I need to keep it.. I have

html = "I can flip this whole thing with one hand
               <span>D#m</span>
The ringleader man
<span>A#</span>                           <span>Dm</span>                          <span>A#</span>
I know~~~~ it's a fact that you'd rather just have some of me instead"
bs = BeautifulSoup(html, 'html.parser')
content = (unicode('').join(unicode(content) for content in bs.contents))

Which I expect to keep the whitespace (the "html" variable contains the contents of a pre tag) -- but it seems to replace multiple spaces with a single space.

How do I keep/get the raw contents of a given beautifulsoup parser?

解决方案

The html parser seems to only keeps whitespace if the content you are parsing is in a <pre> tag -- in my case, the pre tag was removed. Adding

html = "<pre>" + html + "</pre>"

preserved the whitespace.

这篇关于如何在 BeautifulSoup.contents 中保留空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆