Windows PowerShell解析HTML本地文件 [英] Windows PowerShell parse HTML local file

查看:466
本文介绍了Windows PowerShell解析HTML本地文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用PowerShell从HTML文件构建数组.

I would like to build an array from an HTML file using PowerShell.

我正在使用一个脚本,该脚本从本地的Mozilla Firefox Developer Edition下载HTML文件(我正在下载索引文件),并且我想对其进行解析以获取select元素中具有以下内容的options元素的值: id设置为id_country.

I am using a script which download the HTML File from the Mozilla Firefox Developer Edition (I am downloading the index file) locally and I would like to parse it to get the value of the options elements inside the select element which have the id set to id_country.

已建议我为此使用XPath,但是我不知道如何解析文件并根据结果构建数组.也许使用正则表达式可以解决.

I have been recommended to use XPath for that but I can't figure how to parse the file and build an array from the result. Maybe using regex could be a workaround.

HTML文件在这里:

The HTML file is here :

http://pastebin.com/b8cShFLA

我想在这里查看所有options元素的值:

And I would like to all the values of the options elements here:

<select aria-required="true" id="id_country" name="country" required="required">
   <option value="af">Afghanistan</option>
   <option value="al">Albania</option>
   <option value="dz">Algeria</option>
   <option value="as">American Samoa</option>
   <option value="ad">Andorra</option>

...

我对PowerShell还是很陌生,这就是为什么我不太了解我可能会使用的其他解决方案的原因.我需要非常快速的东西,因为它是软件包安装程序的一部分.

I am quite new to PowerShell that's why I am not really aware of different solutions I might be able to use. I would need something quite fast as it's part of a package installer.

基本上,脚本将尝试查看是否存在与用户计算机的语言环境相匹配的安装程序,如果没有,它将默认设置为英语,这就是为什么我需要从该列表中获取值以便检查可用的firefox开发人员的原因语言环境.

Basically the script will try to see if there is an installer which match the locale of the user's computer and if not it will default to english that's why I need to get the values from that list in order to check the firefox dev available locales.

关于,

推荐答案

我看不到要修复的代码示例,因此我将做一个.

I don't see a code sample to fix, so I'll make one.

如果它是远程html,我会使用Invoke-WebRequest,但是对于本地文件来说效果不佳.

If it was a remote html I would use Invoke-WebRequest, but that doesn't work too well with local files.

对于解析本地文件,我建议使用 HTML Agility Pack 进行解析HTML文件,然后使用xPath获取所需的选项.例

For parsing of local files I would recommend using HTML Agility Pack to parse the HTML file, and then use xPath to get the options you're looking for. Ex.

Add-Type -Path .\HTMLAgilityPack\HtmlAgilityPack.dll
$url = (get-item .\b8cShFLA.html).FullName

$doc = New-Object HtmlAgilityPack.HtmlDocument
$doc.LoadHtml((get-content $url))

#Create hashtable to store data in
$langs = @{}

$doc.DocumentNode.SelectSingleNode("//select[@name='country']").SelectNodes("option") | ForEach-Object {
    $short = $_.Attributes[0].Value
    $long = $_.NextSibling.InnerText

    #Store data in hashtable
    $langs[$short] = $long
}

$langs

输出:

Name                           Value
----                           -----
rw                             Rwanda
tv                             Tuvalu
to                             Tonga
pn                             Pitcairn
bh                             Bahrain
lc                             Saint Lucia   

这篇关于Windows PowerShell解析HTML本地文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆