Windows PowerShell解析HTML本地文件 [英] Windows PowerShell parse HTML local file
问题描述
我想使用PowerShell从HTML文件构建数组.
I would like to build an array from an HTML file using PowerShell.
我正在使用一个脚本,该脚本从本地的Mozilla Firefox Developer Edition下载HTML文件(我正在下载索引文件),并且我想对其进行解析以获取select元素中具有以下内容的options元素的值: id设置为id_country.
I am using a script which download the HTML File from the Mozilla Firefox Developer Edition (I am downloading the index file) locally and I would like to parse it to get the value of the options elements inside the select element which have the id set to id_country.
已建议我为此使用XPath,但是我不知道如何解析文件并根据结果构建数组.也许使用正则表达式可以解决.
I have been recommended to use XPath for that but I can't figure how to parse the file and build an array from the result. Maybe using regex could be a workaround.
HTML文件在这里:
The HTML file is here :
我想在这里查看所有options元素的值:
And I would like to all the values of the options elements here:
<select aria-required="true" id="id_country" name="country" required="required">
<option value="af">Afghanistan</option>
<option value="al">Albania</option>
<option value="dz">Algeria</option>
<option value="as">American Samoa</option>
<option value="ad">Andorra</option>
...
我对PowerShell还是很陌生,这就是为什么我不太了解我可能会使用的其他解决方案的原因.我需要非常快速的东西,因为它是软件包安装程序的一部分.
I am quite new to PowerShell that's why I am not really aware of different solutions I might be able to use. I would need something quite fast as it's part of a package installer.
基本上,脚本将尝试查看是否存在与用户计算机的语言环境相匹配的安装程序,如果没有,它将默认设置为英语,这就是为什么我需要从该列表中获取值以便检查可用的firefox开发人员的原因语言环境.
Basically the script will try to see if there is an installer which match the locale of the user's computer and if not it will default to english that's why I need to get the values from that list in order to check the firefox dev available locales.
关于,
推荐答案
我看不到要修复的代码示例,因此我将做一个.
I don't see a code sample to fix, so I'll make one.
如果它是远程html,我会使用Invoke-WebRequest
,但是对于本地文件来说效果不佳.
If it was a remote html I would use Invoke-WebRequest
, but that doesn't work too well with local files.
对于解析本地文件,我建议使用 HTML Agility Pack 进行解析HTML文件,然后使用xPath获取所需的选项.例
For parsing of local files I would recommend using HTML Agility Pack to parse the HTML file, and then use xPath to get the options you're looking for. Ex.
Add-Type -Path .\HTMLAgilityPack\HtmlAgilityPack.dll
$url = (get-item .\b8cShFLA.html).FullName
$doc = New-Object HtmlAgilityPack.HtmlDocument
$doc.LoadHtml((get-content $url))
#Create hashtable to store data in
$langs = @{}
$doc.DocumentNode.SelectSingleNode("//select[@name='country']").SelectNodes("option") | ForEach-Object {
$short = $_.Attributes[0].Value
$long = $_.NextSibling.InnerText
#Store data in hashtable
$langs[$short] = $long
}
$langs
输出:
Name Value
---- -----
rw Rwanda
tv Tuvalu
to Tonga
pn Pitcairn
bh Bahrain
lc Saint Lucia
这篇关于Windows PowerShell解析HTML本地文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!