完整的HTML剥离功能 [英] Complete HTML Strip function

查看：396 发布时间：2015/11/26 20:31:37 c# html .net regex winforms

本文介绍了完整的HTML剥离功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个这样的HTML字符串：

I have an HTML string like this:

<p>First Sentence is this.&#160;Second sentence is this.</p>

我可以删除＆LT; P＆GT; 使用正则表达式从上面的字符串代码功能

但是，如何删除＆放大器;＃160; - EN从上面的字符串中的codeD字符的的WinForms ？

But, how to remove   - encoded characters from the above string in winforms?

我不希望＆放大器;＃160; 是present输出

I don't want   to be present in the output.

推荐答案

您可以使用 XElement.Parse 来得到这样的节点值：

You can use XElement.Parse to get the node value like this:

 var htmlString = "<p>First Sentence is this.&#160;Second sentence is this.</p>";
 var result = System.Xml.Linq.XElement.Parse(htmlString).Value;

如果不是所有的字符串包含有效的XML结构，或者可能没有任何标签的一切，你可以添加虚假标签是这样的：

If not all the strings contain valid XML structure, or may have no tags at all, you can add fake tags like this:

 var htmlString = "<p>First Sentence is this.&#160;Second sentence is this.</p>";
 var result = System.Xml.Linq.XElement.Parse("<root>" + htmlString + "</root>").Value;

结果：

您可能需要添加错误处理的问题，但是这显然比使用正则表达式这更好的。

You might want to add error handling for this, but this is clearly better than using a regex for this.

编辑：

在此情况下，仍无法正常工作，而且你想只处理实体，您可以利用 System.Web.HttpUtility.HtmlDe code 方法来替代与文字HTML实体：

In case this is still not working, and you want to just handle the entities, you can leverage System.Web.HttpUtility.HtmlDecode method to replace HTML entities with literals:

var final_result = System.Web.HttpUtility.HtmlDecode(result);

这篇关于完整的HTML剥离功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

完整的HTML剥离功能 [英] Complete HTML Strip function

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

完整的HTML剥离功能 [英] Complete HTML Strip function

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭