在 Powershell 中使用 invoke-async 复制项目 [英] Copy-item using invoke-async in Powershell

查看:128
本文介绍了在 Powershell 中使用 invoke-async 复制项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本文展示了如何在 PowerShell 中使用 Invoke-Async:https://sqljana.wordpress.com/2018/03/16/powershell-sql-server-run-in-parallel-collect-sql-results-with-print-output-from-across-your-sql-farm-fast/

This article shows how to use Invoke-Async in PowerShell: https://sqljana.wordpress.com/2018/03/16/powershell-sql-server-run-in-parallel-collect-sql-results-with-print-output-from-across-your-sql-farm-fast/

我希望在 PowerShell 中并行运行 copy-item cmdlet,因为另一种方法是通过 Excel 使用 FileSystemObject 并从总共数百万个文件中一次复制一个文件.

I wish to run in parallel the copy-item cmdlet in PowerShell because the alternative is to use FileSystemObject via Excel and copy one file at a time out of a total of millions of files.

我拼凑了以下内容:

.SYNOPSIS
<Brief description>
For examples type:
Get-Help .\<filename>.ps1 -examples
.DESCRIPTION
Copys files from one path to another
.PARAMETER FileList
e.g. C:\path\to\list\of\files\to\copy.txt
.PARAMETER NumCopyThreads
default is 8 (but can be 100 if you want to stress the machine to maximum!)
.EXAMPLE
.\CopyFilesToBackup -filelist C:\path\to\list\of\files\to\copy.txt
.NOTES
#>

[CmdletBinding()] 
Param( 
    [String] $FileList = "C:\temp\copytest.csv", 
    [int] $NumCopyThreads = 8
) 

$filesToCopy = New-Object "System.Collections.Generic.List[fileToCopy]"
$csv = Import-Csv $FileList

foreach($item in $csv)
{
    $file = New-Object fileToCopy
    $file.SrcFileName = $item.SrcFileName
    $file.DestFileName = $item.DestFileName
    $filesToCopy.add($file)
}

$sb = [scriptblock] {
    param($file)
    Copy-item -Path $file.SrcFileName -Destination $file.DestFileName
}
$results = Invoke-Async -Set $filesToCopy -SetParam file -ScriptBlock $sb -Verbose -Measure:$true -ThreadCount 8
$results | Format-Table

Class fileToCopy {
    [String]$SrcFileName = ""
    [String]$DestFileName = ""
}

如下所示的 csv 输入:

the csv input for which looks like this:

SrcFileName,DestFileName
C:\Temp\dummy-data\101438\101438-0154723869.zip,\\backupserver\Project Archives\101438\0154723869.zip
C:\Temp\dummy-data\101438\101438-0165498273.xlsx,\\backupserver\Project Archives\101438\0165498273.xlsx

我缺少什么才能使其正常工作,因为当我运行 .\CopyFiles.ps1 -FileList C:\Temp\test.csv 时什么也没有发生.文件存在于源路径中,但未从 -Set 集合中提取文件对象.(除非我误解了集合是如何使用的?)

What am I missing to get this working, because when I run .\CopyFiles.ps1 -FileList C:\Temp\test.csv nothing happens. The files exist in the source path, but the file objects aren't being pulled from the -Set collection. (Unless I have misunderstood how the collection is used?)

不,我无法使用 robocopy 来执行此操作,因为有数百万个文件会根据其原始位置解析为不同的路径.

No, I can't use robocopy to do this because there are millions of files which resolve to different paths depending upon their original location.

推荐答案

根据您问题中的代码(见底部),我无法解释您的症状,但我建议您的解决方案基于(现在) 标准 Start-ThreadJob cmdlet(随 PowerShell Core 一起提供;在 Windows PowerShell 中,使用 Install-Module ThreadJob -Scope CurrentUser 安装它,对于实例):

I have no explanation for your symptom based on the code in your question (see bottom section), but I suggest basing your solution on the (now) standard Start-ThreadJob cmdlet (comes with PowerShell Core; in Windows PowerShell, install it with Install-Module ThreadJob -Scope CurrentUser, for instance):

这种解决方案比使用第三方Invoke-Async 函数,在撰写本文时存在缺陷,因为它在紧循环中等待作业完成,这会产生不必要的处理开销.

Such a solution is more efficient than use of the third-party Invoke-Async function, which as of this writing is flawed in that it waits for jobs to finish in a tight loop, which creates unnecessary processing overhead.

Start-ThreadJob 作业是轻量级的、基于线程的替代基于进程的 Start-Job 后台作业,但它们与标准作业集成- 管理 cmdlet,例如 Wait-JobReceive-Job.

Start-ThreadJob jobs are a lightweight, thread-based alternative to the process-based Start-Job background jobs, yet they integrate with the standard job-management cmdlets, such as Wait-Job and Receive-Job.

这是一个基于您的代码的自包含示例,用于演示其用法:

Here's a self-contained example based on your code that demonstrates its use:

注意:无论您使用 Start-ThreadJob 还是 Invoke-Async,您都无法显式引用自定义在单独线程中运行的脚本块中的诸如 [fileToCopy] 之类的类(运行空间;请参阅底部),因此下面的解决方案仅使用 [pscustomobject]具有简单性和简洁性的属性的实例.

Note: Whether you use Start-ThreadJob or Invoke-Async, you won't be able to explicit reference custom classes such as [fileToCopy] in the script block that runs in separate threads (runspaces; see bottom section), so the solution below simply uses [pscustomobject] instances with the properties of interest for simplicity and brevity.

# Create sample CSV file with 10 rows.
$FileList = Join-Path ([IO.Path]::GetTempPath()) "tmp.$PID.csv"
@'
Foo,SrcFileName,DestFileName,Bar
1,c:\tmp\a,\\server\share\a,baz
2,c:\tmp\b,\\server\share\b,baz
3,c:\tmp\c,\\server\share\c,baz
4,c:\tmp\d,\\server\share\d,baz
5,c:\tmp\e,\\server\share\e,baz
6,c:\tmp\f,\\server\share\f,baz
7,c:\tmp\g,\\server\share\g,baz
8,c:\tmp\h,\\server\share\h,baz
9,c:\tmp\i,\\server\share\i,baz
10,c:\tmp\j,\\server\share\j,baz
'@ | Set-Content $FileList

# How many threads at most to run concurrently.
$NumCopyThreads = 8

Write-Host 'Creating jobs...'
$dtStart = [datetime]::UtcNow

# Import the CSV data and transform it to [pscustomobject] instances
# with only .SrcFileName and .DestFileName properties - they take
# the place of your original [fileToCopy] instances.
$jobs = Import-Csv $FileList | Select-Object SrcFileName, DestFileName | 
  ForEach-Object {
    # Start the thread job for the file pair at hand.
    Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList $_ { 
      param($f) 
      $simulatedRuntimeMs = 2000 # How long each job (thread) should run for.
      # Delay output for a random period.
      $randomSleepPeriodMs = Get-Random -Minimum 100 -Maximum $simulatedRuntimeMs
      Start-Sleep -Milliseconds $randomSleepPeriodMs
      # Produce output.
      "Copied $($f.SrcFileName) to $($f.DestFileName)"
      # Wait for the remainder of the simulated runtime.
      Start-Sleep -Milliseconds ($simulatedRuntimeMs - $randomSleepPeriodMs)
    }
  }

Write-Host "Waiting for $($jobs.Count) jobs to complete..."

# Synchronously wait for all jobs (threads) to finish and output their results
# *as they become available*, then remove the jobs.
# NOTE: Output will typically NOT be in input order.
Receive-Job -Job $jobs -Wait -AutoRemoveJob
Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"

# Clean up the temp. file
Remove-Item $FileList

上面的结果类似于:

Creating jobs...
Waiting for 10 jobs to complete...
Copied c:\tmp\b to \\server\share\b
Copied c:\tmp\g to \\server\share\g
Copied c:\tmp\d to \\server\share\d
Copied c:\tmp\f to \\server\share\f
Copied c:\tmp\e to \\server\share\e
Copied c:\tmp\h to \\server\share\h
Copied c:\tmp\c to \\server\share\c
Copied c:\tmp\a to \\server\share\a
Copied c:\tmp\j to \\server\share\j
Copied c:\tmp\i to \\server\share\i
Total time lapsed: 00:00:05.1961541

请注意,接收到的输出不反映输入顺序,总体运行时间大约是每个线程运行时间 2 秒(加上开销)的 2 倍,因为 2 个批次"有由于输入计数为 10 而运行,而只有 8 个线程可用.

Note that the output received does not reflect the input order, and that the overall runtime is roughly 2 times the per-thread runtime of 2 seconds (plus overhead), because 2 "batches" have to be run due to the input count being 10, whereas only 8 threads were made available.

如果您将线程数增加到 10 或更多(默认值为 50),则总运行时间将下降到 2 秒加上开销,因为所有作业都将同时运行.

If you upped the thread count to 10 or more (50 is the default), the overall runtime would drop to 2 seconds plus overhead, because all jobs then run concurrently.

警告:以上数字源于在 Microsoft Windows 10 Pro(64 位;版本 1903)上运行的 PowerShell Core 版本,使用版本 2.0.1ThreadJob 模块.
令人费解的是,相同的代码在 Windows PowerShell v5.1.18362.145 中 慢得多.

Caveat: The above numbers stem from running in PowerShell Core, version on Microsoft Windows 10 Pro (64-bit; Version 1903), using version 2.0.1 of the ThreadJob module.
Inexplicably, the same code is much slower in Windows PowerShell, v5.1.18362.145.

但是,为了性能和内存消耗,在您的情况下最好使用批处理(分块),即每个线程处理多个文件对.

However, for performance and memory consumption it is better to use batching (chunking) in your case, i.e, to process multiple file pairs per thread.

以下解决方案演示了这种方法;调整 $chunkSize 以找到适合您的批量大小.

The following solution demonstrates this approach; tweak $chunkSize to find a batch size that works for you.

# Create sample CSV file with 10 rows.
$FileList = Join-Path ([IO.Path]::GetTempPath()) "tmp.$PID.csv"
@'
Foo,SrcFileName,DestFileName,Bar
1,c:\tmp\a,\\server\share\a,baz
2,c:\tmp\b,\\server\share\b,baz
3,c:\tmp\c,\\server\share\c,baz
4,c:\tmp\d,\\server\share\d,baz
5,c:\tmp\e,\\server\share\e,baz
6,c:\tmp\f,\\server\share\f,baz
7,c:\tmp\g,\\server\share\g,baz
8,c:\tmp\h,\\server\share\h,baz
9,c:\tmp\i,\\server\share\i,baz
10,c:\tmp\j,\\server\share\j,baz
'@ | Set-Content $FileList

# How many threads at most to run concurrently.
$NumCopyThreads = 8

# How many files to process per thread
$chunkSize = 3

# The script block to run in each thread, which now receives a
# $chunkSize-sized *array* of file pairs.
$jobScriptBlock = { 
  param([pscustomobject[]] $filePairs)
  $simulatedRuntimeMs = 2000 # How long each job (thread) should run for.
  # Delay output for a random period.
  $randomSleepPeriodMs = Get-Random -Minimum 100 -Maximum $simulatedRuntimeMs
  Start-Sleep -Milliseconds $randomSleepPeriodMs
  # Produce output for each pair.  
  foreach ($filePair in $filePairs) {
    "Copied $($filePair.SrcFileName) to $($filePair.DestFileName)"
  }
  # Wait for the remainder of the simulated runtime.
  Start-Sleep -Milliseconds ($simulatedRuntimeMs - $randomSleepPeriodMs)
}

Write-Host 'Creating jobs...'
$dtStart = [datetime]::UtcNow

$jobs = & {

  # Process the input objects in chunks.
  $i = 0
  $chunk = [pscustomobject[]]::new($chunkSize)
  Import-Csv $FileList | Select-Object SrcFileName, DestFileName | ForEach-Object {
    $chunk[$i % $chunkSize] = $_
    if (++$i % $chunkSize -ne 0) { return }
    # Note the need to wrap $chunk in a single-element helper array (, $chunk)
    # to ensure that it is passed *as a whole* to the script block.
    Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList (, $chunk) -ScriptBlock $jobScriptBlock
    $chunk = [pscustomobject[]]::new($chunkSize) # we must create a new array
  }

  # Process any remaining objects.
  # Note: $chunk -ne $null returns those elements in $chunk, if any, that are non-null
  if ($remainingChunk = $chunk -ne $null) { 
    Start-ThreadJob -ThrottleLimit $NumCopyThreads -ArgumentList (, $remainingChunk) -ScriptBlock $jobScriptBlock
  }

}

Write-Host "Waiting for $($jobs.Count) jobs to complete..."

# Synchronously wait for all jobs (threads) to finish and output their results
# *as they become available*, then remove the jobs.
# NOTE: Output will typically NOT be in input order.
Receive-Job -Job $jobs -Wait -AutoRemoveJob
Write-Host "Total time lapsed: $([datetime]::UtcNow - $dtStart)"

# Clean up the temp. file
Remove-Item $FileList

虽然输出实际上是相同的,但请注意这次只创建了 4 个作业,每个作业处理(最多)$chunkSize (3) 文件对.

While the output is effectively the same, note how only 4 jobs were created this time, each of which processed (up to) $chunkSize (3) file pairs.

至于你尝试了什么:

您显示的屏幕截图表明问题在于您的自定义类 [fileToCopy]Invoke-Async 运行的脚本块不可见.

The screen shot you show suggests that the problem is that your custom class, [fileToCopy], isn't visible to the script block run by Invoke-Async.

由于 Invoke-Async 在单独的运行空间中通过 PowerShell SDK 调用脚本块,这些运行空间对调用者的状态一无所知,因此预计这些运行空间不知道您的类(这同样适用于Start-ThreadJob).

Since Invoke-Async invokes the script block via the PowerShell SDK in separate runspaces that know nothing about the caller's state, it is to be expected that these runspaces don't know your class (this equally applies to Start-ThreadJob).

但是,不清楚为什么这会在您的代码中出现问题,因为您的脚本块没有明确引用您的类:您的脚本块参数 $file 不受类型限制(它是隐式的 [object] 类型).

However, it is unclear why that is a problem in your code, because your script block doesn't make an explicit reference to you class: your script-block parameter $file is not type-constrained (it is implicitly [object]-typed).

因此,只需访问脚本块内的自定义类实例的属性应该就可以了,而且在我对 Windows PowerShell v5.1.18362.145 的测试中确实如此在 Microsoft Windows 10 专业版(64 位;版本 1903)上.

Therefore, simply accessing the properties of your custom-class instance inside the script block should work, and indeed does in my tests on Windows PowerShell v5.1.18362.145 on Microsoft Windows 10 Pro (64-bit; Version 1903).

但是,如果您的实际脚本块代码要显式引用自定义类 [fileToCopy] - 例如通过将参数定义为 param([fileToToCopy] $file) - 你看到症状.

However, if your real script-block code were to explicitly reference custom class [fileToCopy] - such as by defining the parameter as param([fileToToCopy] $file) - you would see the symptom.

这篇关于在 Powershell 中使用 invoke-async 复制项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆