BeautifulSoup和ASP.NET/C# [英] BeautifulSoup and ASP.NET/C#
问题描述
具有与ASP.NET/C#(可能使用IronPython的或以其他方式)任何人都集成BeautifulSoup?
是否有一个BeautifulSoup的替代或与ASP.NET/C#作品很好端口
Has anyone integrated BeautifulSoup with ASP.NET/C# (possibly using IronPython or otherwise)? Is there a BeautifulSoup alternative or a port that works nicely with ASP.NET/C#
计划使用该库的目的是提取的阅读的从任何随机URL文本。
The intent of planning to use the library is to extract readable text from any random URL.
感谢
推荐答案
的Html敏捷性包是一个类似的项目,但对于C#和.NET
Html Agility Pack is a similar project, but for C# and .NET
编辑:
要提取所有可读的文本:
To extract all readable text:
document.DocumentNode.InnerText
请注意,这将返回的文本内容<脚本方式>
标签
Note that this will return the text content of <script>
tags.
要解决这个问题,你可以删除所有的&LT;脚本&GT;
标签,就像这样:
To fix that, you can remove all of the <script>
tags, like this:
foreach(var script in doc.DocumentNode.Descendants("script").ToArray())
script.Remove();
foreach(var style in doc.DocumentNode.Descendants("style").ToArray())
style.Remove();
(来源:<一个href=\"http://stackoverflow.com/questions/2785092/c-htmlagilitypack-extract-inner-text/2785108#2785108\">SLaks)
这篇关于BeautifulSoup和ASP.NET/C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!