HTML转换为纯文本格式,同时保留P,BR,UL,OL? [英] Convert HTML to Plain Text while preserving P, BR, UL, OL?
问题描述
在从HTML文本到Excel表导出,我试图保持像HTML换行符基本打印格式(< BR>
,< p>
),列表(< OL>
,< UL>
)等
During exporting from an HTML text to an Excel sheet, I'm trying to preserve basic formattings like HTML line breaks (<br>
, <p>
), lists (<ol>
, <ul>
) etc.
的例输入:的
<p>This is a test.</p>
<p>This is another<br>test.</p>
<ul>
<li>10</li>
<li>20</li>
<li>30</li>
</ul>
<p>End.</p>
的输出示例:的
This is a test.
This is another
test.
- 10
- 20
- 30
End.
的
免费的实用程序的 HTMLAsText 距离著名的NirSoft家伙似乎正是我想要做的,不幸的是它带有没有源代码:
The free utility HTMLAsText from the famous NirSoft guy seems to do just what I want, unfortunately it comes with no source code:
即使检查后约。这里对堆栈溢出和浏览谷歌为20小时,类似的问题,我能找到最接近的是这。代码项目的文章
Even after examining the approx. 20 similar questions here on Stack Overflow and browsing Google for hours, the closest thing I could find is this Code Project article.
因此,我的问题是:
任何人是否知道一个类/库,有HTML转换为纯文本,同时保留基本的打印格式的?
Is anyone aware of a class/library that could convert HTML to Plain Text while preserving basic formattings?
更新2013年5月10日
我结束了一个功能,在看到完整的代码在引擎收录。一>
推荐答案
你能不能更换自己做:
<br /> with Environment.NewLine
</p> with Environment.NewLine + Environment.NewLine
<li> with " - ".
然后,只需去掉与正则表达式的HTML的休息吗?这似乎达到你想要你的例子输出是什么。当然,有人可能有一个更优雅的解决方案,这一点。 =)
Then just strip out the rest of the HTML with regex? It would seem to achieve what you want your example output to be. Of course, someone may have a more elegant solution that that. =)
这篇关于HTML转换为纯文本格式,同时保留P,BR,UL,OL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!