在 PowerShell 中查找数据集的统计模式 [英] Find the statistical mode(s) of a dataset in PowerShell

查看：51 发布时间：2021/6/19 20:52:04 powershell statistics

本文介绍了在 PowerShell 中查找数据集的统计模式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个自我回答的问题是这个问题的后续:

This self-answered question is a follow-up to this question:

如何确定给定数据集的(数组)统计模式，即最常出现的一个值或一组值?

How can I determine a given dataset's (array's) statistical mode, i.e. the one value or the set of values that occur most frequently?

例如，在数组1,2,2,3,4,4,5中有两种模式，2和4，因为它们是最常出现的值.

For instance, in array 1, 2, 2, 3, 4, 4, 5 there are two modes, 2 and 4, because they are the values occurring most frequently.

推荐答案

使用 Group-Object、Sort-Object 和 do 的组合... while 循环:

Use a combination of Group-Object, Sort-Object, and a do ... while loop:

# Sample dataset.
$dataset = 1, 2, 2, 3, 4, 4, 5

# Group the same numbers and sort the groups by member count, highest counts first.
$groups = $dataset | Group-Object | Sort-Object Count -Descending

# Output only the numbers represented by those groups that have 
# the highest member count.
$i = 0
do { $groups[$i].Group[0] } while ($groups[++$i].Count -eq $groups[0].Count)

上面产生了 2 和 4，这是两种模式(出现最频繁的值，在这种情况下各两次)，按升序排序(因为 Group-Object 按分组标准排序，Sort-Object 的排序算法稳定).

The above yields 2 and 4, which are the two modes (values occurring most frequently, twice each in this case), sorted in ascending order (because Group-Object sorts by the grouping criterion and Sort-Object's sorting algorithm is stable).

注意:虽然这个解决方案在概念上很简单，但大型数据集的性能可能是一个问题；请参阅底部部分，了解对某些输入可能进行的优化.

Note: While this solution is conceptually straightforward, performance with large datasets may be a concern; see the bottom section for an optimization that is possible for certain inputs.

说明:

组-Object 按相等对所有输入进行分组.

Group-Object groups all inputs by equality.

排序-Object -Descending 以降序方式按成员计数对结果组进行排序(最常出现的输入在前).


Sort-Object -Descending sorts the resulting groups by member count in descending fashion (most frequently occurring inputs first).
do ... while 语句循环遍历已排序的组并输出每个组代表的输入，因此出现次数(频率)最高，正如第一组的成员数所暗示的那样.
The do ... while statement loops over the sorted groups and outputs the input represented by each as long as the group-member and therefore occurrence count (frequency) is the highest, as implied by the first group's member count.
性能更好的解决方案，包含字符串和数字:
如果输入元素是统一的简单数字或字符串(而不是复杂对象)，则可以进行优化:
If the input elements are uniformly simple numbers or strings (as opposed to complex objects), an optimization is possible:
Group-Object 的 -NoElement 禁止收集每个组中的单个输入.

Group-Object's -NoElement suppresses collecting the individual inputs in each group.
每个组的 .Name 属性反映了分组值，但作为 字符串 这样做，因此必须将其转换回其原始数据类型.
Each group's .Name property reflects the grouping value, but does so as a string, so it must be converted back to its original data type.
# Sample dataset.
# Must be composed of all numbers or strings.
$dataset = 1, 2, 2, 3, 4, 4, 5

# Determine the data type of the elements of the dataset via its first element.
# All elements are assumed to be of the same type.
$type = $dataset[0].GetType()

# Group the same numbers and sort the groups by member count, highest counts first.
$groups = $dataset | Group-Object -NoElement | Sort-Object Count -Descending

# Output only the numbers represented by those groups that have 
# the highest member count.
# -as $type converts the .Name string value back to the original type.
$i = 0
do { $groups[$i].Name -as $type } while ($groups[++$i].Count -eq $groups[0].Count)


                        这篇关于在 PowerShell 中查找数据集的统计模式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在 PowerShell 中查找数据集的统计模式 [英] Find the statistical mode(s) of a dataset in PowerShell

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 PowerShell 中查找数据集的统计模式 [英] Find the statistical mode(s) of a dataset in PowerShell

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭