为什么这个 PowerShell 代码 (Invoke-WebRequest/getElementsByTagName) 在我的机器上慢得难以置信,而在其他机器上却没有? [英] Why is this PowerShell code (Invoke-WebRequest / getElementsByTagName) so incredibly slow on my machines, but not others?
问题描述
我在 PowerShell 中编写了一些屏幕抓取代码,并惊讶地发现解析几个 HTML 表格需要大约 30 秒.我把它拆开,试图找出所有时间都花在哪里,它似乎在 getElementsByTagName
调用中.
I wrote some screen-scraping code in PowerShell and was surprised that it took around 30 seconds to parse a few HTML tables. I stripped it down to try and figure out where all the time was being spent, and it seems to be in the getElementsByTagName
calls.
我在下面包含了一个脚本,在我的家庭桌面、我的工作桌面和我的家庭平板电脑上,每次迭代大约需要 1-2 秒(完整的结果粘贴在下面).但是,PowerShell 社区中的其他人报告的时间要短得多(每次迭代只有几毫秒).
I've included a script below which on both my home desktop, my work desktop and my home slate, takes around 1-2 seconds for each iteration (full results pasted below). However, other people in the PowerShell community are reporting far shorter times (only several milliseconds for each iteration).
我正在努力寻找缩小问题范围的任何方法,而且似乎没有针对 OS/PS/.NET/IE 版本的模式.
I'm struggling to find any way of narrowing down the problem, and there doesn't seem to be a pattern to the OS/PS/.NET/IE versions.
我目前运行的桌面是全新的 Windows 8 安装,仅安装了 PS3 和 .NET 4.5(以及所有 Windows 更新补丁).没有 Visual Studio.没有 PowerShell 配置文件.
The desktop I'm currently running it on is a brand new Windows 8 install with only PS3 and .NET 4.5 installed (and all Windows Update patches). No Visual Studio. No PowerShell profile.
$url = "http://www.icy-veins.com/restoration-shaman-wow-pve-healing-gear-loot-best-in-slot"
$response = (iwr $url).ParsedHtml
# Loop through the h2 tags
$response.body.getElementsByTagName("h2") | foreach {
# Get the table that comes after the heading
$slotTable = $_.nextSibling
# Grab the rows from the table, skipping the first row (column headers)
measure-command { $rows = $slotTable.getElementsByTagName("tr") | select -Skip 1 } | select TotalMilliseconds
}
我的桌面上的结果(工作 PC 和 slate 给出的结果几乎相同):
Results from my desktop (the work PC and slate give near identical results):
TotalMilliseconds
-----------------
1575.7633
2371.5566
1073.7552
2307.8844
1779.5518
1063.9977
1588.5112
1372.4927
1248.7245
1718.3555
3283.843
2931.1616
2557.8595
1230.5093
995.2934
但是,Google+ PowerShell 社区中的一些人报告了这样的结果:
TotalMilliseconds
-----------------
76.9098
112.6745
56.6522
140.5845
84.9599
48.6669
79.9283
73.4511
94.0683
81.4443
147.809
139.2805
111.4078
56.3881
41.3386
我尝试了 PowerShell ISE 和标准控制台,没有区别.对于正在进行的工作,这些时间似乎有点过分,从 Google+ 社区中的帖子来看,它可以走得更快!
I've tried both PowerShell ISE and a standard console, no difference. For the work being done, these times seem kinda excessive, and judging by the posts in the Google+ community, it can go quicker!
推荐答案
我在 64 位模式下运行脚本时同样缓慢,但在 32 位模式下运行时,一切都非常快!
I got the same slowness running the script in 64 bits, but when running in 32bits mode, everything is very fast !
Lee Holmes 能够重现这个问题,这是他的文章
Lee Holmes was able to reproduce the issue, and here is his writeup
问题在于他将 COM 对象通过管道传输到另一个 cmdlet——在本例中为 Select-Object.发生这种情况时,我们尝试通过属性名称绑定参数.枚举 COM 对象的属性名称非常慢——所以我们我们将 86% 的时间花在两个非常基本的 CLR API 调用上:
"The issue is that he’s piping COM objects into another cmdlet – in this case, Select-Object. When that happens, we attempt to bind parameters by property name. Enumerating property names of a COM object is brutally slow – so we’re spending 86% of our time on two very basic CLR API calls:
(…)//从 COM 类型获取函数描述typeinfo.GetFuncDesc(index, out pFuncDesc);(…)//从 COM 函数描述中获取函数名typeinfo.GetDocumentation(funcdesc.memid, out strName, out strDoc, out id, out strHelp);(…)
(…) // Get the function description from a COM type typeinfo.GetFuncDesc(index, out pFuncDesc); (…) // Get the function name from a COM function description typeinfo.GetDocumentation(funcdesc.memid, out strName, out strDoc, out id, out strHelp); (…)
我们或许可以通过缓存来做一些聪明的事情.
We might be able to do something smart here with caching.
一种解决方法是不通过管道输入 Select-Object,而是使用语言功能:
A workaround is to not pipe into Select-Object, but instead use language features:
# Grab the rows from the table, skipping the first row (column headers)
$allRows = @($slotTable.getElementsByTagName("tr"))
$rows = $allRows[1..$allRows.Count]
"
这篇关于为什么这个 PowerShell 代码 (Invoke-WebRequest/getElementsByTagName) 在我的机器上慢得难以置信,而在其他机器上却没有?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!