需要更快地制作 PowerShell 脚本 [英] Need to make a PowerShell script faster

查看:35
本文介绍了需要更快地制作 PowerShell 脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我自学了 Powershell,所以我对它一无所知.

我需要用我输入的确切行数搜索数据库(数据库是预定义的),它包含 > 11800 个条目.

你能帮我找出是什么让这个变慢吗?

代码:

$Dict = Get-Content "C:\Users\----\Desktop\Powershell Program\US.txt"if($Right -ne "") {$Comb = $Letter + $Right$total = [int]0$F = ""做 {$F = $Dict |选择对象 -Index $totalif($F.Length -eq $Num) {if($F.Chars("0") + $F.Chars("1") -eq $Comb) {添加内容 "C:\Users\----\Desktop\Powershell Program\Results.txt" "$F"}}$总计++写主机 $total}直到([int]$total -gt [int]118619)$total = [int]0$F = ""}

如何加快逐行搜索/匹配过程?我是通过多线程来做的吗?如果是这样怎么办?

解决方案

似乎您在使用 powershell 之前至少了解一种其他语言,并且正在开始基本上复制您可能在另一种语言中完成的工作.这是学习一门新语言的好方法,但当然,一开始你可能会得到一些有点奇怪或性能不佳的方法.

所以首先我想分解一下你的代码实际上在做什么,作为一个粗略的概述:

  1. 一次读取文件的每一行并将其存储在 $Dict 变量中.
  2. 循环次数与行数相同.
  3. 在循环的每次迭代中:

    1. 获取与循环迭代匹配的单行(本质上是通过另一个迭代,而不是索引,稍后会详细介绍).
    2. 获取该行的第一个字符,然后获取第二个字符,然后将它们组合起来.
    3. 如果它等于预先确定的字符串,则将此行附加到文本文件中.

第 3-1 步是真正减慢速度的原因

要了解原因,您需要对 PowerShell 中的管道有所了解.接受并处理管道的 Cmdlet 接受一个或多个对象,但它们一次处理一个对象.他们甚至无法访问管道的其余部分.

对于 Select-Object cmdlet 也是如此.因此,当您将一个包含 18,500 个对象的数组通过管道传输到 Select-Object -Index 18000 时,您需要发送 17,999 个对象进行检查/处理,然后它才能为您提供您想要的对象.您可以看到索引越大,花费的时间越长.

既然你已经有了一个数组,你可以直接通过带方括号的索引访问任何数组成员[],如下所示:

$Dict[18000]

对于给定的数组,无论索引是什么,都需要相同的时间.

现在对于 Select-Object -Index 的单个调用,您可能不会注意到它需要多长时间,即使索引非常大;问题是你已经在遍历整个数组了,所以这是一个很大的复合.

您基本上必须计算 1..18000 的总和,即 或大约 162,000,000 次迭代!(感谢 user2460798 更正我的数学)

证明

我对此进行了测试.首先,我创建了一个包含 19,000 个对象的数组:

$a = 1..19000 |%{"zzzz~$_"}

然后我测量了访问它的两种方法.首先,使用 select -index:

measure-command { 1..19000 |% { $a |选择 -Index ($_-1 ) } |出空 }

结果:

总分钟数:20.4383861316667总毫秒数:1226303.1679

然后使用索引运算符 ([]):

measure-command { 1..19000 |% { $a[$_-1] } |出空 }

结果:

总分钟数:0.00788774666666667总毫秒数:473.2648

结果非常惊人,使用 Select-Object 需要将近 2,600 倍时间.

一个计数循环

以上是导致您的主要放缓的唯一原因,但我想指出其他事情.

通常在大多数语言中,您会使用 for 循环来计数.在 PowerShell 中,这看起来像这样:

for ($i = 0; $i -lt $total ; $i++) {# $i 具有迭代的值}

简而言之,for 循环中有三个语句.第一个是在循环开始之前运行的表达式.$i = 0 将迭代器初始化为 0,这是第一条语句的典型用法.

接下来是一个条件;这将在每次迭代时进行测试,如果返回 true,则循环将继续.这里 $i -lt $total 比较检查以查看 $i 小于 $total 的值,其他地方定义的一些其他变量,大概是最大值.

最后一条语句在循环的每次迭代中执行.$i++$i = $i + 1 相同,所以在这种情况下,我们在每次迭代时递增 $i.>

它比使用 do/until 循环要简洁一点,而且更容易理解,因为 for 循环的含义是众所周知.

其他注意事项

如果您对自己编写的工作代码的更多反馈感兴趣,请查看代码审查.发帖前请仔细阅读那里的规则.

I taught my self Powershell so I do not know everything about it.

I need to search a database with the exact amount of lines I have put in (the database is predefined), it contains > 11800 entries.

Can you please help me find what is making this slow?

Code:

$Dict = Get-Content "C:\Users\----\Desktop\Powershell Program\US.txt"

if($Right -ne "") {
    $Comb = $Letter + $Right
    $total = [int]0    
    $F = ""

    do {
        $F = $Dict | Select-Object -Index $total
        if($F.Length -eq $Num) {
            if($F.Chars("0") + $F.Chars("1") -eq $Comb) {
                Add-Content "C:\Users\----\Desktop\Powershell Program\Results.txt" "$F"
            }
        }
        $total++
        Write-Host $total
    } until([int]$total -gt [int]118619)

    $total = [int]0
    $F = ""
}

How do I speed this line by line searching/matching process up? Do I do by multi-threading? If so how?

解决方案

It seems like you've known at least one other language before powershell, and are starting out by basically replicating what you might have done in another language in this one. That's a great way to learn a new language, but of course in the beginning you might end up with methods that are a bit strange or not performant.

So first I want to break down what your code is actually doing, as a rough overview:

  1. Read every line of the file at once and store it in the $Dict variable.
  2. Loop the same number of times as there are lines.
  3. In each iteration of the loop:

    1. Get the single line that matches the loop iteration (essentially through another iteration, rather than indexing, more on that later).
    2. Get the first character of the line, then the second, then combine them.
    3. If that's equal to a pre-determined string, append this line to a text file.

Step 3-1 is what's really slowing this down

To understand why, you need to know a little bit about pipelines in PowerShell. Cmdlets that accept and work on pipelines take one or more objects, but they process a single object at a time. They don't even have access to the rest of the pipeline.

This is also true for the Select-Object cmdlet. So when you take an array with 18,500 objects in it, and pipe it into Select-Object -Index 18000, you need to send in 17,999 objects for inspection/processing before it can give you the one you want. You can see how the time taken would get longer and longer the larger the index is.

Since you already have an array, you directly access any array member by index with square brackets [] like so:

$Dict[18000]

For a given array, that takes the same amount of time no matter what the index is.

Now for a single call to Select-Object -Index you probably aren't going to notice how long it takes, even with a very large index; the problem is that you're looping through the entire array already, so this is compounding greatly.

You're essentially having to do the sum of 1..18000 which is about or approximately 162,000,000 iterations! (thanks to user2460798 for correcting my math)

Proof

I tested this. First, I created an array with 19,000 objects:

$a = 1..19000 | %{"zzzz~$_"}

Then I measured both methods of accessing it. First, with select -index:

measure-command { 1..19000 | % { $a | select -Index ($_-1 ) } | out-null }

Result:

TotalMinutes      : 20.4383861316667
TotalMilliseconds : 1226303.1679

Then with the indexing operator ([]):

measure-command { 1..19000 | % { $a[$_-1] } | out-null }

Result:

TotalMinutes      : 0.00788774666666667
TotalMilliseconds : 473.2648

The results are pretty striking, it takes nearly 2,600 times longer to use Select-Object.

A counting loop

The above is the single thing causing your major slowdown, but I wanted to point out something else.

Typically in most languages, you would use a for loop to count. In PowerShell this would look like this:

for ($i = 0; $i -lt $total ; $i++) {
    # $i has the value of the iteration
}

In short, there are three statements in the for loop. The first is an expression that gets run before the loop starts. $i = 0 initializes the iterator to 0, which is the typical usage of this first statement.

Next is a conditional; this will be tested on each iteration and the loop will continue if it returns true. Here $i -lt $total compares checks to see that $i is less than the value of $total, some other variable defined elsewhere, presumably the maximum value.

The last statement gets executed on each iteration of the loop. $i++ is the same as $i = $i + 1 so in this case we're incrementing $i on each iteration.

It's a bit more concise than using a do/until loop, and it's easier to follow because the meaning of a for loop is well known.

Other Notes

If you're interested in more feedback about working code you've written, have a look at Code Review. Please read the rules there carefully before posting.

这篇关于需要更快地制作 PowerShell 脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆