如何使用C＃去除HTML内容 [英] How to strip off HTML content using C#

查看：63 发布时间：2019/6/15 12:25:46 C# XML HTML

本文介绍了如何使用C＃去除HTML内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想只从HTML文档中提取文本...

我该怎么办呢。 ..请指导我...

解决方案

您可以使用 Html Agility Pack [ ^ ]：

< pre lang =cs> 使用 HtmlAgilityPack; // 首先，添加对HtmlAgilityPack.dll的引用，然后在代码文件的顶部添加此行
 HtmlDocument doc = new HtmlDocument（）; 
 doc.LoadHtml（ 这是一个 HTML text！）; // 您还可以使用Load（）方法加载HTML文件 
 string textOnly = doc.DocumentNode.InnerText; 
您还可以使用正则表达式：
 string html = 这是一个 HTML text。; 
 string textOnly = System.Text.RegularExpressions.Regex.Replace（html， <（。+？）>， ）; 
希望这会有所帮助。

Hi,

I want to extract the text only from an HTML document...
How can I do it,... Pls guide me...

解决方案

You can use the Html Agility Pack[^] for this:

using HtmlAgilityPack; // first, add a reference to HtmlAgilityPack.dll, and then add this line at the top of your code file

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<u>This is a</u> <strong>HTML</strong> <em>text</em>!"); // you can also load a HTML file, using the Load() method
string textOnly = doc.DocumentNode.InnerText;

You can also use a regular expression:

string html = "This is a <strong id='anId'>HTML</strong> <em><u>text</u></em>.";
string textOnly = System.Text.RegularExpressions.Regex.Replace(html, "<(.+?)>", "");

Hope this helps.

这篇关于如何使用C＃去除HTML内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用C＃去除HTML内容 [英] How to strip off HTML content using C#

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

如何使用C＃去除HTML内容 [英] How to strip off HTML content using C#

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭