在PowerShell V3中解析HTML表 [英] Parse HTML Table in PowerShell V3

查看:33
本文介绍了在PowerShell V3中解析HTML表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下HTML表链接到HTML

我想解析它并将其转换为XML/CSV/PS对象, 我尝试使用HtmlAgilityPack.dll,但没有成功. 有人可以给我任何指示去做吗?

I want to parse it and convert it to XML/CSV/PS Object, I tried to do with HtmlAgilityPack.dll but no success. Can anybody give me any directions to do it?

我想将表转换为PSObject并将其导出到csv, 我目前只是代码的开头, 并访问行,但我无法访问行中的值

I want to convert the table to a PSObject and export it to csv, I currently have just the beginning of the code, and access to the lines but i can't access to the values in the lines

Add-Type -Path C:\Windows\system32\HtmlAgilityPack.dll
$HTML = New-Object HtmlAgilityPack.HtmlDocument
$res = $HTML.Load("C:\Test\Test.html")
$table = $HTML.DocumentNode.SelectNodes("//table/tr/td/nobr")

当我访问$ table [0..47] .InnerHtml时,我仅获得文件的第一行**列, 我无法访问第二个,等等

when i access to $table[0..47].InnerHtml i get only the first ** column ** of the file, i can't access to the 2nd and etc

感谢Ohad

推荐答案

,您可以尝试使用此方法获取<nobr>标记中的所有html.我让您找到输出所需内容的逻辑...

you can try this to get all the html in <nobr> tags. I let you find the logic to output what you want...

$ie = new-object -com "InternetExplorer.Application"
$ie.navigate("http://urltoyourfile.html")
$doc = $ie.Document
($doc.getElementsByTagName("nobr"))|%{$_.innerHTML}

输出:

Lead User&nbsp;&nbsp;
Accesses&nbsp;&nbsp;
Last Accessed&nbsp;&nbsp;
Average&nbsp;&nbsp;
Max&nbsp;&nbsp;
Min&nbsp;&nbsp;
Total&nbsp;&nbsp;
amirt</NO br>
2
01/20/2013 09:40:47
04:18:17
06:19:26
02:17:09
08:36:35
andream
1
01/20/2013 10:33:01
02:34:37
02:34:37
02:34:37
02:34:37
avnerm
1
01/17/2013 11:34:16
00:30:44
00:30:44
00:30:44
00:30:44
brouria

一种解析方式:

($doc.getElementsByTagName("nobr"))|%{
    write-host -nonew $_.innerHTML";"
    $cpt++
    if ($cpt % 8 -eq 0){$cpt=1;write-host ""}
}

这篇关于在PowerShell V3中解析HTML表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆