VB.net 直方图 - 如何对数据进行分箱 [英] VB.net Histogram - how to bin data

查看:47
本文介绍了VB.net 直方图 - 如何对数据进行分箱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究直方图类,特别是分箱方法.

I'm working on a histogram class and in particular a binning method.

关于此,我有两个问题:

In relation hereto I have two questions:

  1. 从逻辑/统计的角度来看,这是一个正确/合适的算法吗

  1. Is it a right/appropriate algorithm seen from a logic/statistical point of view

代码是否最优或至少不错 - 请告诉我如何改进它

Is the code optimal or at least decent - please tell me how to improve it

非常感谢任何帮助 - 提前谢谢.

Any help is highly appreciated - thx in advance.

这是我目前的代码...

Here is my code so far...

Public Class Histo
Dim data() As Double
Dim bins As Integer = 0
Dim bw As Double = 0
Dim _min As Double = 0
Dim _max As Double = 0
Dim arrMax As Double = 0
Dim cht As Chart
Public shared Decimals As Integer

Public Sub New(_arr() As Double, _cht As Chart)
    'One-dimensional array as data
    data = _arr

    'No of bins with Sturges method
    bins  = NoBin_ST(data)

    'calculate bin width
    bw = Range(data) / bins

    'bin boundries for first bin 
    _min = Min(data)
    _max = _min + bw

    'max of data
    arrMax = Max(data)

    'chart object
    cht = _cht

    'no of decimals on x-axis
    Decimals = Dec
End Sub

Public Function Binning() As Integer()
    'Binning "algorihtm" for continuous data
    '
    'RETURN: one-dimensional array with n bins
    '
    Array.Sort(data)
    Dim j As Integer = 0
    Dim mn As Double = _min
    Dim mx As Double = _max
    Dim counter(bins-1) As Integer

    For i As Integer = 0 To data.GetLength(0)-1
        'check if data point is within the boundries of the current bin     
        If data(i) >= mn AndAlso data(i) < mx Then
            'add counter in current bin
            counter(j) += 1
        Else
            'special case: at the end at least one data point will equal max of the last bin
            ' and must be counted in that bin
            If data(i) = arrMax  Then
                counter(j) += 1
                Continue For
            End If
            'the data point has exceeded the boundries of the previous bin 
            ' and must be counted in the next bin
            'min and max is increased with the bin width
            mn += bw
            mx += bw
            'go to next bin
            j += 1
            'count data point in this bin and loop again
            counter(j) += 1
        End If
    Next
    Return counter
End Function

.....

推荐答案

不确定这是否更高效,但我认为它更简单一些.

Not sure if this is any more performant, but I think it is a bit simpler.

Function CreateBins(values As IEnumerable(Of Double), numberOfBins As Integer) As IGrouping(Of Integer, Double)()
        If values Is Nothing Then Throw New Exception("Values cannot be null")
        If values.Distinct.Count < 2 Then Throw New Exception("Values must contain at least two ditinct elements")
        If numberOfBins < 1 Then Throw New Exception("numberOfBins must be an integer > 1")

        Dim min = values.Min
        Dim max = values.Max
        Dim binSize = (max - min) / numberOfBins
        ' Checking for two distinct elements should eliminate possibility of min=max and therefore binsize=0

        Dim bins = values.GroupBy(Function(x) Convert.ToInt32(Math.Floor((x - min) / binSize))).ToArray

        ' Group counts available using the ienumerable Count function
        ' Dim counts = bins.Select(Function(x) x.Count)
        ' Or retaining the group key
        ' Dim counts = bins.Select(Function(x) New With {Key x.Key, x.Count})

        Return bins
End Function

现在每个垃圾箱都是一个组.原始值作为组的一部分保留,允许潜在的后续分析.Count 可使用组函数 Count()

Each bin is now a group. The original values are retained as part of the group, allowing potential follow up analysis. Count is available using the group function Count()

这篇关于VB.net 直方图 - 如何对数据进行分箱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆