为什么要避免使用增加赋值运算符(+ =)创建集合 [英] Why should I avoid using the increase assignment operator (+=) to create a collection
问题描述
递增赋值运算符(+=
)通常用于StackOverflow网站上的[PowerShell]
问答中,以构造集合对象,例如:
The increase assignment operator (+=
) is often used in [PowerShell]
questions and answers at the StackOverflow site to construct a collection objects, e.g.:
$Collection = @()
1..$Size | ForEach-Object {
$Collection += [PSCustomObject]@{Index = $_; Name = "Name$_"}
}
但是,它似乎效率很低.
Yet it appears an very inefficient operation.
通常可以声明在PowerShell中构建对象集合时应避免使用增加赋值运算符(+=
)吗?
Is it Ok to generally state that the increase assignment operator (+=
) should be avoided for building an object collection in PowerShell?
推荐答案
是的,应避免使用增加赋值运算符(+=
)来构建对象集合.
除了使用+=
运算符通常需要更多的语句(由于数组初始化= @()
)并且鼓励将整个集合存储在内存中而不是将其中间地推入管道的事实之外,效率低下.
Yes, the increase assignment operator (+=
) should be avoided for building an object collection.
Apart from the fact that using the +=
operator usually requires more statements (because of the array initialization = @()
) and it encourages to store the whole collection in memory rather then push it intermediately into the pipeline, it is inefficient.
之所以效率低下,是因为每次使用+=
运算符时,它都会执行以下操作:
The reason it is inefficient is because every time you use the +=
operator, it will just do:
$Collection = $Collection + $NewObject
由于数组在元素数量方面是不可变的,因此每次迭代将重新创建整个集合.
Because arrays are immutable in terms of element count, the whole collection will be recreated with every iteration.
正确的PowerShell语法为:
The correct PowerShell syntax is:
$Collection = 1..$Size | ForEach-Object {
[PSCustomObject]@{Index = $_; Name = "Name$_"}
}
注意:与其他cmdlet一样;如果只有一项(迭代),则输出将是一个标量而不是一个数组,以将其强制为数组,您可以使用[Array]
类型:[Array]$Collection = 1..$Size | ForEach-Object { ... }
或使用数组子表达式运算符@( )
:$Collection = @(1..$Size | ForEach-Object { ... })
Note: as with other cmdlets; if there is just one item (iteration), the output will be a scalar and not an array, to force it to an array, you might either us the [Array]
type: [Array]$Collection = 1..$Size | ForEach-Object { ... }
or use the Array subexpression operator @( )
: $Collection = @(1..$Size | ForEach-Object { ... })
建议甚至不要将结果存储在变量($a = ...
)中,而是将其中间传递到管道中以节省内存,例如:
Where it is recommended to not even store the results in a variable ($a = ...
) but intermediately pass it into the pipeline to save memory, e.g.:
1..$Size | ForEach-Object {
[PSCustomObject]@{Index = $_; Name = "Name$_"}
} | ConvertTo-Csv .\Outfile.csv
注意:使用 System.Collections.ArrayList
类,它通常与PowerShell管道几乎一样快,但缺点是它消耗的内存比(正确)多得多使用PowerShell管道.
Note: Using the System.Collections.ArrayList
class could also be considered, this is generally almost as fast as the PowerShell pipeline but the disadvantage is that it consumes a lot more memory than (properly) using the PowerShell pipeline.
另请参见:从数组属性中获取唯一索引项的最快方法
要显示与集合大小和性能下降之间的关系,您可以检查以下测试结果:
To show the relation with the collection size and the decrease of performance you might check the following test results:
1..20 | ForEach-Object {
$size = 1000 * $_
$Performance = @{Size = $Size}
$Performance.Pipeline = (Measure-Command {
$Collection = 1..$Size | ForEach-Object {
[PSCustomObject]@{Index = $_; Name = "Name$_"}
}
}).Ticks
$Performance.Increase = (Measure-Command {
$Collection = @()
1..$Size | ForEach-Object {
$Collection += [PSCustomObject]@{Index = $_; Name = "Name$_"}
}
}).Ticks
[pscustomobject]$Performance
} | Format-Table *,@{n='Factor'; e={$_.Increase / $_.Pipeline}; f='0.00'} -AutoSize
Size Increase Pipeline Factor
---- -------- -------- ------
1000 1554066 780590 1.99
2000 4673757 1084784 4.31
3000 10419550 1381980 7.54
4000 14475594 1904888 7.60
5000 23334748 2752994 8.48
6000 39117141 4202091 9.31
7000 52893014 3683966 14.36
8000 64109493 6253385 10.25
9000 88694413 4604167 19.26
10000 104747469 5158362 20.31
11000 126997771 6232390 20.38
12000 148529243 6317454 23.51
13000 190501251 6929375 27.49
14000 209396947 9121921 22.96
15000 244751222 8598125 28.47
16000 286846454 8936873 32.10
17000 323833173 9278078 34.90
18000 376521440 12602889 29.88
19000 422228695 16610650 25.42
20000 475496288 11516165 41.29
这意味着使用+=
运算符将对象的集合大小为 20,000
的情况要比使用PowerShell管道慢 40x
.
Meaning that with a collection size of 20,000
objects using the +=
operator is about 40x
slower than using the PowerShell pipeline for this.
显然,有些人在更正已经使用了增加赋值运算符(+=
)的脚本方面遇到了困难.因此,我创建了一个小指令来做到这一点:
Apparently some people struggle with correcting a script that already uses the increase assignment operator (+=
). Therefore, I have created a little instruction to do so:
- 从相关迭代中删除所有
<variable> +=
分配,仅保留对象项.通过不分配对象,该对象将被简单地放置在管道中.
迭代中是否有多个增加分配或嵌入的迭代或函数都没有关系,最终结果将是相同的.
意思是:
- Remove all the
<variable> +=
assignments from the concerned iteration, just leave only the object item. By not assigning the object, the object will simply be put on the pipeline.
It doesn't matter if there are multiple increase assignments in the iteration or if there are embedded iterations or function, the end result will be the same.
Meaning, this:
ForEach ( ... ) {
$Array += $Object1
$Array += $Object2
ForEach ( ... ) {
$Array += $Object3
$Array += Get-Object
}
}
与以下内容基本相同:
$Array = ForEach ( ... ) {
$Object1
$Object2
ForEach ( ... ) {
$Object3
Get-Object
}
}
注意: :如果没有迭代,则可能没有理由更改脚本,因为可能只涉及一些附加内容 >
- 将迭代的输出(放置在管道上的所有内容)分配给相关的变量.这通常与数组初始化(
$Array = @()
)处于同一级别.例如:
- Assign the output of the iteration (everything that is put on the pipeline) to the concerned a variable. This is usually at the same level as where the array was initialized (
$Array = @()
). e.g.:
$Array = ForEach { ...
Note 1: Again, if you want single object to act as an array, you probably want to use the Array subexpression operator @( )
but you might also consider to do this at the moment you use the array, like: @($Array).Count
or ForEach ($Item in @($Array))
Note 2: Again, you better not assign the output at all but pass the pipeline output directly to the next cmdlet to free up memory: ForEach ( ... ) { ... } | Export-Csv .\File.csv
.
- 删除数组初始化
<Variable> = @()
有关完整示例,请参见:在Powershell中比较阵列
For a full example, see: Comparing Arrays within Powershell
这篇关于为什么要避免使用增加赋值运算符(+ =)创建集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!