为什么 PowerShell 工作流比用于 XML 文件分析的非工作流脚本慢得多 [英] Why PowerShell workflow is significantly slower than non-workflow script for XML file analysis

查看:65
本文介绍了为什么 PowerShell 工作流比用于 XML 文件分析的非工作流脚本慢得多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个 PowerShell 程序来分析 1900 多个大型 XML 配置文件(50000 多行,1.5Mb)的内容.为了测试,我将 36 个测试文件移动到我的 PC(Win 10;PS 5.1;32GB RAM)并编写快速脚本来测试执行速度.

I am writing a PowerShell program to analyse the content of 1900+ big XML configuration files (50000+ lines, 1.5Mb). Just for test I move 36 test files to my PC (Win 10; PS 5.1; 32GB RAM) and write quick script to test the speed of execution.

$TestDir = "E:\Powershell\Test"
$TestXMLs = Get-ChildItem $TestDir -Recurse -Include *.xml

foreach ($TestXML in $TestXMLs)
{
    [xml]$XML = Get-Content $TestXML
    (($XML.root.servers.server).Where{$_.name -eq "Server1"}).serverid
}

即完成 36 到 40 秒.我用 measure-command 做了几次测试.

That is completed for 36 to 40 seconds. I done several tests with measure-command.

然后我尝试了使用 foreach -parallell 的工作流,假设并行加载多个文件将使我获得更快的处理速度.

Then I tried workflow with foreach -paralell assuming that parallel loading of several files will give me more faster process.

Workflow Test-WF
{
    $TestDir = "E:\Powershell\Test"
    $TestXMLs = Get-ChildItem $TestDir -Recurse -Include *.xml

    foreach -parallel -throttle 10 ($TestXML in $TestXMLs)
    {
        [xml]$XML = Get-Content $TestXML
        (($TestXML.root.servers.server).Where{$_.name -eq "Sevrver1"}).serverid
    }
}

Test-WF #execute workflow

工作流需要 118 到 132 秒的脚本.

Script with the workflow needs between 118 and 132 seconds.

现在我只是想知道工作流运行速度如此缓慢的原因是什么?重新编译为 XMAL 可能或更慢的算法以在 WWF 中加载 XML 文件?

Now I am just wondering what could be the reason that workflow works so much slower? Recompiling to XMAL maybe or slower algorithm for loading XML files in WWF?

推荐答案

foreach -parallel 是迄今为止使用 PowerShell 最慢的并行化选项,因为工作流不是为速度而设计的,而是为长时间而设计的- 可以安全中断和恢复的正在运行的操作.

foreach -parallel is by far the slowest parallelization option you have with PowerShell, since Workflows are not designed for speed, but for long-running operations that can be safely interrupted and resumed.

这些安全机制的实现引入了一些开销,这就是您的脚本作为工作流运行时速度较慢的原因.

The implementation of these safety mechanisms introduces some overhead, which is why your script is slower when run as a workflow.

如果要优化执行速度,请改用运行空间:

If you want to optimize for execution speed, use runspaces instead:

$TestDir = "E:\Powershell\Test"
$TestXMLs = Get-ChildItem $TestDir -Recurse -Include *.xml

# Set up runspace pool
$RunspacePool = [runspacefactory]::CreateRunspacePool(1,10)
$RunspacePool.Open()

# Assign new jobs/runspaces to a variable
$Runspaces = foreach ($TestXML in $TestXMLs)
{
    # Create new PowerShell instance to hold the code to execute, add arguments
    $PSInstance = [powershell]::Create().AddScript({
        param($XMLPath)

        [xml]$XML = Get-Content $XMLPath
        (($XML.root.servers.server).Where{$_.name -eq "Server1"}).serverid
    }).AddParameter('XMLPath', $TestXML.FullName)

    # Assing PowerShell instance to RunspacePool
    $PSInstance.RunspacePool = $RunspacePool

    # Start executing asynchronously, keep instance + IAsyncResult objects
    New-Object psobject -Property @{
        Instance = $PSInstance
        IAResult = $PSInstance.BeginInvoke()
        Argument = $TestXML
    }
}

# Wait for the the runspace jobs to complete
while($Runspaces |Where-Object{-not $_.IAResult.IsCompleted})
{
    Start-Sleep -Milliseconds 500
}

# Collect the results
$Results = $Runspaces |ForEach-Object {
    $Output = $_.Instance.EndInvoke($_.IAResult)
    New-Object psobject -Property @{
        File = $TestXML
        ServerID = $Output
    }
}

<小时>

快速 XML 处理奖励提示:

正如 wOxxOm 建议的那样,使用 Xml.Load() 比使用 Get-Content 读取 XML 文档要快得多.


Fast XML processing bonus tips:

As wOxxOm suggests, using Xml.Load() is way faster than using Get-Content to read in the XML document.

此外,使用点符号 ($xml.root.servers.server) 和 Where({}) 扩展方法也会非常缓慢,如果有的话许多 serversserver 节点.使用带有 XPath 表达式的 SelectNodes() 方法来搜索Server1"(请注意 XPath 区分大小写):

Furthermore, using dot notation ($xml.root.servers.server) and the Where({}) extension method is also going to be painfully slow if there are many servers or server nodes. Use the SelectNodes() method with an XPath expression to search for "Server1" instead (be aware that XPath is case-sensitive):

$PSInstance = [powershell]::Create().AddScript({
    param($XMLPath)

    $XML = New-Object Xml
    $XML.Load($XMLPath)
    $Server1Node = $XML.SelectNodes('/root/servers/server[@name = "Server1"]')
    return $Server1Node.serverid
}).AddParameter('XMLPath', $TestXML.FullName)

这篇关于为什么 PowerShell 工作流比用于 XML 文件分析的非工作流脚本慢得多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆