为什么要避免使用增加赋值运算符(+ =)创建集合 [英] Why should I avoid using the increase assignment operator (+=) to create a collection

查看:85
本文介绍了为什么要避免使用增加赋值运算符(+ =)创建集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

递增赋值运算符(+=)通常用于StackOverflow网站上的[PowerShell]问答中,以构造集合对象,例如:

The increase assignment operator (+=) is often used in [PowerShell] questions and answers at the StackOverflow site to construct a collection objects, e.g.:

$Collection = @()
1..$Size | ForEach-Object {
    $Collection += [PSCustomObject]@{Index = $_; Name = "Name$_"}
}

但是,它似乎效率很低.

Yet it appears an very inefficient operation.

通常可以声明在PowerShell中构建对象集合时应避免使用增加赋值运算符(+=)吗?

Is it Ok to generally state that the increase assignment operator (+=) should be avoided for building an object collection in PowerShell?

推荐答案

是的,应避免使用增加赋值运算符(+=)来构建对象集合.
除了使用+=运算符通常需要更多的语句(由于数组初始化= @())并且鼓励将整个集合存储在内存中而不是将其中间地推入管道的事实之外,效率低下.

Yes, the increase assignment operator (+=) should be avoided for building an object collection.
Apart from the fact that using the += operator usually requires more statements (because of the array initialization = @()) and it encourages to store the whole collection in memory rather then push it intermediately into the pipeline, it is inefficient.

之所以效率低下,是因为每次使用+=运算符时,它都会执行以下操作:

The reason it is inefficient is because every time you use the += operator, it will just do:

$Collection = $Collection + $NewObject

由于数组在元素数量方面是不可变的,因此每次迭代将重新创建整个集合.

Because arrays are immutable in terms of element count, the whole collection will be recreated with every iteration.

正确的PowerShell语法为:

The correct PowerShell syntax is:

$Collection = 1..$Size | ForEach-Object {
    [PSCustomObject]@{Index = $_; Name = "Name$_"}
}

注意:与其他cmdlet一样;如果只有一项(迭代),则输出将是一个标量而不是一个数组,以将其强制为数组,您可以使用[Array]类型:[Array]$Collection = 1..$Size | ForEach-Object { ... }或使用数组子表达式运算符@( ) :$Collection = @(1..$Size | ForEach-Object { ... })

Note: as with other cmdlets; if there is just one item (iteration), the output will be a scalar and not an array, to force it to an array, you might either us the [Array] type: [Array]$Collection = 1..$Size | ForEach-Object { ... } or use the Array subexpression operator @( ): $Collection = @(1..$Size | ForEach-Object { ... })

建议甚至不要将结果存储在变量($a = ...)中,而是将其中间传递到管道中以节省内存,例如:

Where it is recommended to not even store the results in a variable ($a = ...) but intermediately pass it into the pipeline to save memory, e.g.:

1..$Size | ForEach-Object {
    [PSCustomObject]@{Index = $_; Name = "Name$_"}
} | ConvertTo-Csv .\Outfile.csv

注意:使用 System.Collections.ArrayList,它通常与PowerShell管道几乎一样快,但缺点是它消耗的内存比(正确)多得多使用PowerShell管道.

Note: Using the System.Collections.ArrayList class could also be considered, this is generally almost as fast as the PowerShell pipeline but the disadvantage is that it consumes a lot more memory than (properly) using the PowerShell pipeline.

另请参见:从数组属性中获取唯一索引项的最快方法

要显示与集合大小和性能下降之间的关系,您可以检查以下测试结果:

To show the relation with the collection size and the decrease of performance you might check the following test results:

1..20 | ForEach-Object {
    $size = 1000 * $_
    $Performance = @{Size = $Size}
    $Performance.Pipeline = (Measure-Command {
        $Collection = 1..$Size | ForEach-Object {
            [PSCustomObject]@{Index = $_; Name = "Name$_"}
        }
    }).Ticks
    $Performance.Increase = (Measure-Command {
        $Collection = @()
        1..$Size | ForEach-Object {
            $Collection  += [PSCustomObject]@{Index = $_; Name = "Name$_"}
        }
    }).Ticks
    [pscustomobject]$Performance
} | Format-Table *,@{n='Factor'; e={$_.Increase / $_.Pipeline}; f='0.00'} -AutoSize

 Size  Increase Pipeline Factor
 ----  -------- -------- ------
 1000   1554066   780590   1.99
 2000   4673757  1084784   4.31
 3000  10419550  1381980   7.54
 4000  14475594  1904888   7.60
 5000  23334748  2752994   8.48
 6000  39117141  4202091   9.31
 7000  52893014  3683966  14.36
 8000  64109493  6253385  10.25
 9000  88694413  4604167  19.26
10000 104747469  5158362  20.31
11000 126997771  6232390  20.38
12000 148529243  6317454  23.51
13000 190501251  6929375  27.49
14000 209396947  9121921  22.96
15000 244751222  8598125  28.47
16000 286846454  8936873  32.10
17000 323833173  9278078  34.90
18000 376521440 12602889  29.88
19000 422228695 16610650  25.42
20000 475496288 11516165  41.29

这意味着使用+=运算符将对象的集合大小为 20,000 的情况要比使用PowerShell管道慢 40x .

Meaning that with a collection size of 20,000 objects using the += operator is about 40x slower than using the PowerShell pipeline for this.

显然,有些人在更正已经使用了增加赋值运算符(+=)的脚本方面遇到了困难.因此,我创建了一个小指令来做到这一点:

Apparently some people struggle with correcting a script that already uses the increase assignment operator (+=). Therefore, I have created a little instruction to do so:

  1. 从相关迭代中删除所有<variable> +=分配,仅保留对象项.通过不分配对象,该对象将被简单地放置在管道中.
    迭代中是否有多个增加分配或嵌入的迭代或函数都没有关系,最终结果将是相同的.
    意思是:
  1. Remove all the <variable> += assignments from the concerned iteration, just leave only the object item. By not assigning the object, the object will simply be put on the pipeline.
    It doesn't matter if there are multiple increase assignments in the iteration or if there are embedded iterations or function, the end result will be the same.
    Meaning, this:

 

ForEach ( ... ) {
    $Array += $Object1
    $Array += $Object2
    ForEach ( ... ) {
        $Array += $Object3
        $Array += Get-Object

    }
}

与以下内容基本相同:

$Array = ForEach ( ... ) {
    $Object1
    $Object2
    ForEach ( ... ) {
        $Object3
        Get-Object

    }
}

注意: :如果没有迭代,则可能没有理由更改脚本,因为可能只涉及一些附加内容

  1. 将迭代的输出(放置在管道上的所有内容)分配给相关的变量.这通常与数组初始化($Array = @())处于同一级别.例如:
  1. Assign the output of the iteration (everything that is put on the pipeline) to the concerned a variable. This is usually at the same level as where the array was initialized ($Array = @()). e.g.:

 

$Array = ForEach { ... 

注1: 同样,如果希望单个对象充当数组,则可能要使用

Note 1: Again, if you want single object to act as an array, you probably want to use the Array subexpression operator @( ) but you might also consider to do this at the moment you use the array, like: @($Array).Count or ForEach ($Item in @($Array))
Note 2: Again, you better not assign the output at all but pass the pipeline output directly to the next cmdlet to free up memory: ForEach ( ... ) { ... } | Export-Csv .\File.csv.

  1. 删除数组初始化 <Variable> = @()

有关完整示例,请参见:在Powershell中比较阵列

For a full example, see: Comparing Arrays within Powershell

这篇关于为什么要避免使用增加赋值运算符(+ =)创建集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆