使用Python将HTML转换为RTF字符串 [英] HTML to RTF string using Python

查看:221
本文介绍了使用Python将HTML转换为RTF字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种将HTML文本转换为RTF字符串的方法.有没有从事这项工作的图书馆.我在项目中动态获取html内容,并需要将其以RTF格式呈现.我正在使用HTML解析器将HTML文本转换为普通字符串,然后尝试使用PyRTF转换为RTF格式.有没有更好的方法可以做到这一点.

I am looking for a way to convert HTML text to RTF string. Is there any libraries that does this job. I get html content dynamically in my project and need it to be rendered in RTF format. I am using HTML parser to convert HTML text to normal string and then have trying to use PyRTF for conversion to RTF format. Is there any better way that this can be done.Thanks in advance.

推荐答案

RTF似乎是一种转换格式.例如,我曾尝试在Mac OS X上的应用程序之间剪切和粘贴,其中RTF是 franca franca 的某种意思.这些应用程序中有一些是Microsoft应用程序(相关的RTF是Microsoft开发的格式),其他则不是.从一个表面上讲RTF的应用程序复制到另一个应用程序时,甚至基本的格式信息(如字体大小,字体,行距和列表样式(有序或无序))也很混乱.简而言之,那是一团糟.

RTF seems a dicey format to convert from/to. I've tried cutting and pasting among applications on Mac OS X, for example, where RTF is something of a lingua franca. Some of those apps are Microsoft apps (relevant in that RTF is a Microsoft-developed format), others are not. Even basic formatting information like font size, font face, line spacing, and list styling (ordered or unordered) is jumbled when copying from one ostensibly RTF-speaking app to another. Simply put, it's a mess.

我一直在寻找以编程方式读取,写入和转换RTF的方法,最好是从Python中读取.我在PyPI上发现了许多软件包,尝试它们的经历令人失望.例如,当当前版本为1.9.1时,它们将支持RTF 1.5. RTF已经存在了很长一段时间,但是具有2005年历史的规范并不是最近.有很多陷阱和不兼容之处.地段.

I have searched for ways to programmatically read, write, and transform RTF, preferably from Python. I found a number of packages on PyPI, trying them out has been a disappointing experience. They would support RTF 1.5, say, when the current version is 1.9.1. RTF has been around a long time, but a 2005-vintage spec is not very recent. There were lots of gotchas and incompatibilities. LOTS.

现在,我并不是说这是不可能的,或者没有其他库可以解决这个问题.例如,我没有尝试过其他人提到的zopyx.convert.也许很棒.但是看看它的依赖关系(Java,FOP等),它看起来像一个非常复杂(因此可能很脆弱)的工具链.我在github上阅读了其代码,而Python实际上仅作为协调单板.它组织外部工具XFC,XINC,FOP和PrinceXML-其中四个是商业软件.其中包括处理RTF的关键XFC部分.怀疑我吧.

Now, I'm not saying it's impossible, or that there aren't other libraries out there that would do the trick. I have not tried the zopyx.convert mentioned by others here, for example. Maybe it's great. But looking at its dependencies--Java, FOP, etc.--it looks like a pretty complex (and thus likely fragile) toolchain. I read its code on github, and the Python is really only there as a coordination veneer. It organizes external tools XFC, XINC, FOP, and PrinceXML--three of the four of which are commercial software. That includes the key XFC part that deals with RTF. Color me skeptical.

我发现有两个转换器值得一看:如果您使用的是Mac,

There are two converters that I've found are worth a look: If you're using a Mac, the textutil command line program is actually one of the better and simpler tools I've seen.

textutil -convert html filename.rtf -output filename.html

另一个值得考虑的格式设置引擎是 LibreOffice .它是免费的,开放源代码的,可以合理地实现自动化,并且作为互操作性中心具有良好的基础.这不只是一个猜测;我围绕它构建了复杂的,多格式的文档工作流程.

The other formatting engine that's worth considering is LibreOffice. It's free, open source, reasonably amenable to automation, and a decent foundation as an interoperability hub. That's not just a guess; I've built complex, multi-format document workflows around it.

我会质疑您为什么要尝试进入RTF.这似乎是您想摆脱的一种文档格式.但是,如果您需要去那里,textutil和LibreOffice是我发现的最差的机制.

I would question why you're trying to get into RTF. That seems like a document format you'd be trying to escape from. But if you need to go there, textutil and LibreOffice are the least-worst mechanisms I've found.

这篇关于使用Python将HTML转换为RTF字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆