为什么这个 PowerShell 代码 (Invoke-WebRequest/getElementsByTagName) 在我的机器上慢得难以置信,而在其他机器上却没有? [英] Why is this PowerShell code (Invoke-WebRequest / getElementsByTagName) so incredibly slow on my machines, but not others?

查看:54
本文介绍了为什么这个 PowerShell 代码 (Invoke-WebRequest/getElementsByTagName) 在我的机器上慢得难以置信,而在其他机器上却没有?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 PowerShell 中编写了一些屏幕抓取代码,并惊讶地发现解析几个 HTML 表格需要大约 30 秒.我把它拆开,试图找出所有时间都花在哪里,它似乎在 getElementsByTagName 调用中.

I wrote some screen-scraping code in PowerShell and was surprised that it took around 30 seconds to parse a few HTML tables. I stripped it down to try and figure out where all the time was being spent, and it seems to be in the getElementsByTagName calls.

我在下面包含了一个脚本,在我的家庭桌面、我的工作桌面和我的家庭平板电脑上,每次迭代大约需要 1-2 秒(完整的结果粘贴在下面).但是,PowerShell 社区中的其他人报告的时间要短得多(每次迭代只有几毫秒).

I've included a script below which on both my home desktop, my work desktop and my home slate, takes around 1-2 seconds for each iteration (full results pasted below). However, other people in the PowerShell community are reporting far shorter times (only several milliseconds for each iteration).

我正在努力寻找缩小问题范围的任何方法,而且似乎没有针对 OS/PS/.NET/IE 版本的模式.

I'm struggling to find any way of narrowing down the problem, and there doesn't seem to be a pattern to the OS/PS/.NET/IE versions.

我目前运行的桌面是全新的 Windows 8 安装,仅安装了 PS3 和 .NET 4.5(以及所有 Windows 更新补丁).没有 Visual Studio.没有 PowerShell 配置文件.

The desktop I'm currently running it on is a brand new Windows 8 install with only PS3 and .NET 4.5 installed (and all Windows Update patches). No Visual Studio. No PowerShell profile.

$url = "http://www.icy-veins.com/restoration-shaman-wow-pve-healing-gear-loot-best-in-slot"
$response = (iwr $url).ParsedHtml

# Loop through the h2 tags
$response.body.getElementsByTagName("h2") | foreach {

    # Get the table that comes after the heading
    $slotTable = $_.nextSibling

    # Grab the rows from the table, skipping the first row (column headers)
    measure-command { $rows = $slotTable.getElementsByTagName("tr") | select -Skip 1 } | select TotalMilliseconds
}

我的桌面上的结果(工作 PC 和 slate 给出的结果几乎相同):

Results from my desktop (the work PC and slate give near identical results):

TotalMilliseconds
-----------------
        1575.7633
        2371.5566
        1073.7552
        2307.8844
        1779.5518
        1063.9977
        1588.5112
        1372.4927
        1248.7245
        1718.3555
         3283.843
        2931.1616
        2557.8595
        1230.5093
         995.2934

但是,Google+ PowerShell 社区中的一些人报告了这样的结果:

 TotalMilliseconds
 -----------------
           76.9098
          112.6745
           56.6522
          140.5845
           84.9599
           48.6669
           79.9283
           73.4511
           94.0683
           81.4443
           147.809
          139.2805
          111.4078
           56.3881
           41.3386

我尝试了 PowerShell ISE 和标准控制台,没有区别.对于正在进行的工作,这些时间似乎有点过分,从 Google+ 社区中的帖子来看,它可以走得更快!

I've tried both PowerShell ISE and a standard console, no difference. For the work being done, these times seem kinda excessive, and judging by the posts in the Google+ community, it can go quicker!

推荐答案

查看我的评论:https://connect.microsoft.com/PowerShell/feedback/details/778371/invoke-webrequest-getelementsbytagname-is-incredously-slow-on-some-machines#tabs

我在 64 位模式下运行脚本时同样缓慢,但在 32 位模式下运行时,一切都非常快!

I got the same slowness running the script in 64 bits, but when running in 32bits mode, everything is very fast !

Lee Holmes 能够重现这个问题,这是他的文章

Lee Holmes was able to reproduce the issue, and here is his writeup

问题在于他将 COM 对象通过管道传输到另一个 cmdlet——在本例中为 Select-Object.发生这种情况时,我们尝试通过属性名称绑定参数.枚举 COM 对象的属性名称非常慢——所以我们我们将 86% 的时间花在两个非常基本的 CLR API 调用上:

"The issue is that he’s piping COM objects into another cmdlet – in this case, Select-Object. When that happens, we attempt to bind parameters by property name. Enumerating property names of a COM object is brutally slow – so we’re spending 86% of our time on two very basic CLR API calls:

(…)//从 COM 类型获取函数描述typeinfo.GetFuncDesc(index, out pFuncDesc);(…)//从 COM 函数描述中获取函数名typeinfo.GetDocumentation(funcdesc.memid, out strName, out strDoc, out id, out strHelp);(…)

(…) // Get the function description from a COM type typeinfo.GetFuncDesc(index, out pFuncDesc); (…) // Get the function name from a COM function description typeinfo.GetDocumentation(funcdesc.memid, out strName, out strDoc, out id, out strHelp); (…)

我们或许可以通过缓存来做一些聪明的事情.

We might be able to do something smart here with caching.

一种解决方法是不通过管道输入 Select-Object,而是使用语言功能:

A workaround is to not pipe into Select-Object, but instead use language features:

# Grab the rows from the table, skipping the first row (column headers)
$allRows = @($slotTable.getElementsByTagName("tr"))
$rows = $allRows[1..$allRows.Count]

"

这篇关于为什么这个 PowerShell 代码 (Invoke-WebRequest/getElementsByTagName) 在我的机器上慢得难以置信,而在其他机器上却没有?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆