如何实现并行作业和队列系统 [英] How to implement a parallel jobs and queues system

查看:154
本文介绍了如何实现并行作业和队列系统的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我花了几天试图实现并行作业和队列系统,但...我试过,但我不能做到。

I spent days trying to implement a parallel jobs and queues system, but... I tried but I can't make it. Here is the code without implementing nothing, and CSV example from where looks.

我相信这篇文章可以帮助其他用户在他们的项目。

I'm sure this post can help other users in their projects.

每个用户都有自己的电脑,因此CSV文件如下所示:

Each user have his pc, so the CSV file look like:

pc1,user1
pc2,user2
pc800,user800 

CODE:

#Source File:
$inputCSV = '~\desktop\report.csv'
$csv = import-csv $inputCSV -Header PCName, User
echo $csv #debug

#Output File:
$report = "~\desktop\output.csv"

#---------------------------------------------------------------

#Define search:
$findSize = 40GB
Write-Host "Lonking for $findSize GB sized Outlook files"

#count issues:
$issues = 0 

#---------------------------------------------------------------

foreach($item in $csv){

    if (Test-Connection -Quiet -count 1 -computer $($item.PCname)){

        $w7path = "\\$($item.PCname)\c$\users\$($item.User)\appdata\Local\microsoft\outlook"

        $xpPath = "\\$($item.PCname)\c$\Documents and Settings\$($item.User)\Local Settings\Application Data\Microsoft\Outlook"

            if(Test-Path $W7path){

                if(Get-ChildItem $w7path -Recurse -force -Include *.ost -ErrorAction "SilentlyContinue" | Where-Object {$_.Length -gt $findSize}){

                    $newLine =  "{0},{1},{2}" -f $($item.PCname),$($item.User),$w7path
                    $newLine |  add-content $report

                    $issues ++
                    Write-Host "Issue detected" #debug
                    }
            }

            elseif(Test-Path $xpPath){

                if(Get-ChildItem $w7path -Recurse -force -Include *.ost -ErrorAction "SilentlyContinue" | Where-Object {$_.Length -gt $findSize}){

                    $newLine =  "{0},{1},{2}" -f $($item.PCname),$($item.User),$xpPath
                    $newLine |  add-content $report

                    $issues ++
                    Write-Host "Issue detected" #debug
                    }
            }

            else{
                write-host "Error! - bad path"
            }
    }

    else{
        write-host "Error! - no ping"
    }
}

Write-Host "All done! detected $issues issues"


推荐答案

PowerShell中的并行数据处理不是很简单,尤其是
排队。尝试使用已经完成的一些现有工具。
您可以参阅模块
SplitPipeline 。 cmdlet
Split-Pipeline 是为并行输入数据处理而设计的,并支持
输入排队(请参阅参数 Load )。例如,对于具有10个输入项的4个并行
流水线,每次一个时间代码将如下所示:

Parallel data processing in PowerShell is not quite simple, especially with queueing. Try to use some existing tools which have this already done. You may take look at the module SplitPipeline. The cmdlet Split-Pipeline is designed for parallel input data processing and supports queueing of input (see the parameter Load). For example, for 4 parallel pipelines with 10 input items each at a time the code will look like this:

$csv | Split-Pipeline -Count 4 -Load 10, 10 {process{
    <operate on input item $_>
}} | Out-File $outputReport

所有你需要做的是实现代码 <对输入项$ _> 进行操作。
此命令执行并行处理和排队。

All you have to do is to implement the code <operate on input item $_>. Parallel processing and queueing is done by this command.

UPDATE更新问题代码。这里是有一些
注释的原型代码。他们很重要。并行工作与
直接不同,还有一些规则要遵循。

UPDATE for the updated question code. Here is the prototype code with some remarks. They are important. Doing work in parallel is not the same as directly, there are some rules to follow.

$csv | Split-Pipeline -Count 4 -Load 10, 10 -Variable findSize {process{
    # Tips
    # - Operate on input object $_, i.e $_.PCname and $_.User
    # - Use imported variable $findSize
    # - Do not use Write-Host, use (for now) Write-Warning
    # - Do not count issues (for now). This is possible but make it working
    # without this at first.
    # - Do not write data to a file, from several parallel pipelines this
    # is not so trivial, just output data, they will be piped further to
    # the log file
    ...
}} | Set-Content $report
# output from all jobs is joined and written to the report file






更新:如何编写进度信息


UPDATE: How to write progress information


SplitPipeline处理了800个目标csv,惊人。还有还有
让用户知道脚本是否活着...?扫描一个大csv可以花大约
20分钟。类似正在进行25%,50%,75%...

SplitPipeline handled pretty well a 800 targets csv, amazing. Is there anyway to let the user know if the script is alive...? Scan a big csv can take about 20 mins. Something like "in progress 25%","50%","75%"...

有几个选项。最简单的只是使用
调用 Split-Pipeline 开关 -Verbose 。因此,您将获得有关进度的详细消息,以及
查看脚本是否存活。

There are several options. The simplest is just to invoke Split-Pipeline with the switch -Verbose. So you will get verbose messages about the progress and see that the script is alive.

另一个简单的选项是从作业中写入和观察详细消息,
eg Write-Verbose ... -Verbose 即使不调用
Split-Pipeline 详细

Another simple option is to write and watch verbose messages from the jobs, e.g. Write-Verbose ... -Verbose which will write messages even if Split-Pipeline is invoked without Verbose.

另一种选择是使用正确的进度消息 / code>。
查看脚本:

And another option is to use proper progress messages with Write-Progress. See the scripts:

  • Test-ProgressJobs.ps1
  • Test-ProgressTotal.ps1

Test-ProgressTotal.ps1 还显示如何使用从作业
并发更新的收集器。你可以使用类似的技术计数问题(
原始问题代码这样做)。当所有操作完成时,向用户显示
的总数。

Test-ProgressTotal.ps1 also shows how to use a collector updated from jobs concurrently. You can use the similar technique for counting issues (the original question code does this). When all is done show the total number of issues to a user.

这篇关于如何实现并行作业和队列系统的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆