Python 将 html 转换为文本并模拟格式 [英] Python convert html to text and mimic formatting

查看:28
本文介绍了Python 将 html 转换为文本并模拟格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习 BeautifulSoup,并找到了许多html2text"解决方案,但我正在寻找的解决方案应该模仿格式:

I'm learning BeautifulSoup, and found many "html2text" solutions, but the one i'm looking for should mimic the formatting:

<ul>
<li>One</li>
<li>Two</li>
</ul>

会变成

* One
* Two

Some text
<blockquote>
More magnificent text here
</blockquote>
Final text

Some text

    More magnificent text here

Final text

我正在阅读文档,但没有直接看到任何内容.有什么帮助吗?我愿意使用beautifulsoup以外的其他东西.

I'm reading the docs, but I'm not seeing anything straight forward. Any help? I'm open to using something other than beautifulsoup.

推荐答案

看看 Aaron Swartz 的 html2text 脚本(可以使用 pip install html2text 安装).请注意,输出是有效的 Markdown.如果由于某种原因不完全适合您,一些相当微不足道的调整应该可以让您获得问题中的确切输出:

Take a look at Aaron Swartz's html2text script (can be installed with pip install html2text). Note that the output is valid Markdown. If for some reason that doesn't fully suit you, some rather trivial tweaks should get you the exact output in your question:

In [1]: import html2text

In [2]: h1 = """<ul>
   ...: <li>One</li>
   ...: <li>Two</li>
   ...: </ul>"""

In [3]: print html2text.html2text(h1)
  * One
  * Two

In [4]: h2 = """<p>Some text
   ...: <blockquote>
   ...: More magnificent text here
   ...: </blockquote>
   ...: Final text</p>"""

In [5]: print html2text.html2text(h2)
Some text

> More magnificent text here

Final text

这篇关于Python 将 html 转换为文本并模拟格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆