在Powershell Core 6中替换HtmlWebResponseObject.ParsedHtml [英] HtmlWebResponseObject.ParsedHtml replacement in Powershell Core 6

查看:92
本文介绍了在Powershell Core 6中替换HtmlWebResponseObject.ParsedHtml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是解析使用Invoke-WebRequest检索的html文件.如果可能的话,我想避免使用任何外部库.

My goal is to parse an html file retrieved with Invoke-WebRequest. If possible I'd like to avoid any external libraries.

我面临的问题是,Invoke-WebRequest返回的是BasicHtmlWebResponseObject而不是HtmlWebResponseObject

The problem I am facing is, that Invoke-WebRequest returns a BasicHtmlWebResponseObject instead of a HtmlWebResponseObject since Powershell 6. The Basic version misses the ParsedHtml property. Is there a good alternative to parse html in Powershell Core 6?

我尝试使用Select-Xml,但是我的html并不完全有效(例如缺少结束标记),因此无法解析结果.

I've tried to use Select-Xml but my html is not entirely valid (e.g. a missing closing tag), hence this fails to parse the result.

我发现的另一种替代方法是使用New-Object -ComObject "HTMLFile",但是据我了解,这依赖于Internet Explorer进行解析,我想避免这种情况.

Another alternative I've found is to use New-Object -ComObject "HTMLFile" but from my understanding this relies on Internet Explorer for parsing which I'd like to avoid.

此处 a>,可惜这个问题自8个月以来一直没有任何答案或活动.

There is a very similar question here but sadly this question had no answer or activity since 8 months.

推荐答案

如注释中所述,没有库实际上是不可能的.一个非常好的库,您可以将其用于dotnet的 AngleSharp 库.它具有强大的html解析功能,并且dotnet代码与powershell交互非常友好,请查看此

As mentioned in the comments it is not really possible without a library. One very good library you could use it the AngleSharp library for dotnet. It has great html parsing capabilities and dotnet code interacts very friendly with powershell, have a look at this link.

以下是他们网站上的示例:

Here is an example from their website:

var config = Configuration.Default.WithDefaultLoader();
var address = "https://en.wikipedia.org/wiki/List_of_The_Big_Bang_Theory_episodes";
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(address);
var cellSelector = "tr.vevent td:nth-child(3)";
var cells = document.QuerySelectorAll(cellSelector);
var titles = cells.Select(m => m.TextContent);

这篇关于在Powershell Core 6中替换HtmlWebResponseObject.ParsedHtml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆