HTML转换为纯文本格式，同时保留P，BR，UL，OL？ [英] Convert HTML to Plain Text while preserving P, BR, UL, OL?

查看：440 发布时间：2016/10/1 19:50:39 c# html string html-parsing

本文介绍了HTML转换为纯文本格式，同时保留P，BR，UL，OL？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在从HTML文本到Excel表导出，我试图保持像HTML换行符基本打印格式（< BR> ，< p> ），列表（< OL> ，< UL> ）等

During exporting from an HTML text to an Excel sheet, I'm trying to preserve basic formattings like HTML line breaks (<br>, <p>), lists (<ol>, <ul>) etc.

的例输入：的

<p>This is a test.</p>
<p>This is another<br>test.</p>

<ul>
    <li>10</li>
    <li>20</li>
    <li>30</li>
</ul>

<p>End.</p>

的输出示例：的

This is a test.

This is another
test.

- 10
- 20
- 30

End.

的

免费的实用程序的 HTMLAsText 距离著名的NirSoft家伙似乎正是我想要做的，不幸的是它带有没有源代码：

The free utility HTMLAsText from the famous NirSoft guy seems to do just what I want, unfortunately it comes with no source code:

即使检查后约。这里对堆栈溢出和浏览谷歌为20小时，类似的问题，我能找到最接近的是这。代码项目的文章

Even after examining the approx. 20 similar questions here on Stack Overflow and browsing Google for hours, the closest thing I could find is this Code Project article.

因此，我的问题是：

任何人是否知道一个类/库，有HTML转换为纯文本，同时保留基本的打印格式的？

Is anyone aware of a class/library that could convert HTML to Plain Text while preserving basic formattings?

更新2013年5月10日

我结束了一个功能，在看到完整的代码在引擎收录

推荐答案

你能不能更换自己做：

<br /> with Environment.NewLine
</p> with Environment.NewLine + Environment.NewLine
<li> with " - ".

然后，只需去掉与正则表达式的HTML的休息吗？这似乎达到你想要你的例子输出是什么。当然，有人可能有一个更优雅的解决方案，这一点。 =）

Then just strip out the rest of the HTML with regex? It would seem to achieve what you want your example output to be. Of course, someone may have a more elegant solution that that. =)

这篇关于HTML转换为纯文本格式，同时保留P，BR，UL，OL？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

HTML转换为纯文本格式，同时保留P，BR，UL，OL？ [英] Convert HTML to Plain Text while preserving P, BR, UL, OL?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

HTML转换为纯文本格式，同时保留P，BR，UL，OL？ [英] Convert HTML to Plain Text while preserving P, BR, UL, OL?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭