Excel - 计算与ID匹配的唯一值,针对100,000+个案例进行了优化 [英] Excel - Count unique values that match ID, optimized for 100,000+ cases

查看:310
本文介绍了Excel - 计算与ID匹配的唯一值,针对100,000+个案例进行了优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

参考下面的excel屏幕截图,我正在寻找一个公式解决方案,它计算列A中每个ID号的列B(颜色)中唯一值的数量。



我在列C中估算了所需的结果。因此,例如,ID 1(A2)只有一个唯一的颜色,Gray(B2)将在C2中返回1。 ID 2只有一种唯一的颜色,黄色(B3,B4),并在C3和C4中返回1。 ID 3,有两种独特的颜色,蓝色和紫色,从而在C5中返回2到C8。等等。



由于这将运行接近100,000行,所以遇到的许多基于索引和/或匹配的解决方案可能要花费很长的时间来计算。我发现ID值都是按升序排列的,可以通过启动公式来加快速度,即= IF(A2 = A1,或者这样的事情),提前告知任何有关如何注意:我正在处理也有近100列的文件,不需要帮助列的解决方案将是理想的。



编辑/添加:在我的主数据文件中,列B中存在空白单元格的实例。当计算列C结果时,是否有一种忽略空白单元格的方法?



解决方案

是一个VBA例程,应该为该数量的条目快速运行。我们创建一个类模块(用户定义的对象),它由与每个ID相关联的颜色的集合(Dictionary)组成,以及该颜色的计数。 (不要真的需要计数,但是添加它是微不足道的,以防你想要其他目的;还可以作为一些可以做的事情)。



然后,我们会在屏幕截图中显示相邻列中的结果。结果可以在其他地方输出,即使是在不同的工作表上,也会有不同的代码更改。



确保在模块开头读取注释,了解重要信息和关于设置正确。



课程模块






  Option Explicit 
'RENAME此模块:cID

私人pID As String
私人pColor As String
私人pColors为字典

公共属性获取ID()As String
ID = pID
结束属性
公共属性让ID(值作为字符串)
pID =值
结束属性

公共属性获取颜色()As String
颜色= pColor
结束属性
公共属性让颜色(值作为字符串)
pColor =值
结束属性

公共属性获取颜色()作为字典
设置颜色= pColors
结束属性
公共函数ADDColor(值作为字符串)
'也可以计算时间这个颜色分配
如果不是pColors.Exists(Value)然后
pColors.Add键:=值,项:= 1
Else
pColors(Value)= pColors(Value )+ 1
如果
结束函数

私有子类Class_Initialize()
设置pColors =新字典
End Sub






常规模块



编辑编辑以消除空白行计数






  Option Explicit 
'设置引用到Microsoft脚本运行时(工具/引用)

Sub IDColorCount()
Dim cID As cID,dID As字典
Dim wsData As Worksheet,rData As Range
Dim vData As Variant,vRes As Variant
Dim I As Long

'设置数据工作表和范围
'将数据读入数组以获得更快的计算
设置wsData =工作表(sheet1)
使用wsData
设置rData = .Range(.Cells(1,1),。细胞(.Rows。 Count,1).End(xlUp))。调整大小(columnsize:= 2)
vData = rData
结束

'查看数据并收集信息
设置dID =新字典
对于I = 2到UBound(vData,1)
如果不是vData(I,1)=然后
设置cID =新的cID
with cID
.ID = vData(I,1)
.Color = vData(I,2)
.ADDColor .Color

如果不是dID.Exists (.ID)然后
dID.Add Key:= .ID,Item:= cID
Else
dID(.ID).ADDColor .Color
End If
结束
结束如果
下一个我

'大小结果数组
ReDim vRes(1到UBound(vData),1到1)
vRes (1,1)=Count
对于I = 2到UBound(vData,1)
如果不是vData(I,1)=然后_
vRes(I,1 )= dID(CStr(vData(I,1)))。Colors.Count
下一个I

'结果可以写成anyplace
使用rData.Offset(0, 2).Resize(columnsize:= 1)
.EntireColumn.Clear
.Valu e = vRes
结束

结束子





in reference to the excel screen capture below, I'm looking for a formula solution that counts the number of unique values in Column B (Colour) for each ID number in Column A.

I've imputed the desired result in Column C. So, for instance, ID 1 (A2) has only one unique colour, Grey (B2), which would return a 1 in C2. ID 2 has only one unique colour, yellow (B3, B4), and returns 1 in C3 and C4. ID 3, has two unique colours, Blue and Purple, thus returning a 2 in C5 through C8. Etc.

Because this will be run for close to 100,000 rows, many of the index and/or match based solutions I've come across take way to long to compute. I have found that the ID values, which are all in ascending order, can be used to speed things up by starting the formula with =IF(A2=A1, or something like this. Thanks in advance to anyone with some ideas about how to solve this with a lean formula.

Note: I am working with files that also have close to 100 columns. A solution that doesn't require helper columns would be ideal.

EDIT/ADDITION: In my main data file, there are instances of blank cells in column B. Is there a way to ignore the blank cells when counting for column C results?

解决方案

Here is a VBA routine that should run quickly for that number of entries. We create a Class Module (User Defined Object) that consists of a collection (Dictionary) of Colors associated with each ID, and an count of that color. (Don't really need the count but it is trivial to add it in, in case you want it for some other purpose; and also as a demonstration of some of what could be done).

Then we output the results in the adjacent column, as you show in your screen shot. The results could be output elsewhere, even on a different worksheet, with minor code changes.

Be sure to read the notes at the beginning of the modules for important information and about setting things up properly.

Class Module


Option Explicit
'RENAME this module:  cID

Private pID As String
Private pColor As String
Private pColors As Dictionary

Public Property Get ID() As String
    ID = pID
End Property
Public Property Let ID(Value As String)
    pID = Value
End Property

Public Property Get Color() As String
    Color = pColor
End Property
Public Property Let Color(Value As String)
    pColor = Value
End Property

Public Property Get Colors() As Dictionary
    Set Colors = pColors
End Property
Public Function ADDColor(Value As String)
    'Might as well also count # of times this color assigned
    If Not pColors.Exists(Value) Then
        pColors.Add Key:=Value, Item:=1
    Else
        pColors(Value) = pColors(Value) + 1
    End If
End Function

Private Sub Class_Initialize()
    Set pColors = New Dictionary
End Sub


Regular Module

EDIT (edited to eliminate count for blank rows)


Option Explicit
'Set reference to Microsoft Scripting Runtime (Tools/References)

Sub IDColorCount()
    Dim cID As cID, dID As Dictionary
    Dim wsData As Worksheet, rData As Range
    Dim vData As Variant, vRes As Variant
    Dim I As Long

'Set the data worksheet and range
'Read the data into an array for faster calculations
Set wsData = Worksheets("sheet1")
With wsData
    Set rData = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(columnsize:=2)
    vData = rData
End With

'Go through the data and collect the information
Set dID = New Dictionary
For I = 2 To UBound(vData, 1)
  If Not vData(I, 1) = "" Then
    Set cID = New cID
    With cID
        .ID = vData(I, 1)
        .Color = vData(I, 2)
        .ADDColor .Color

        If Not dID.Exists(.ID) Then
            dID.Add Key:=.ID, Item:=cID
        Else
            dID(.ID).ADDColor .Color
        End If
    End With
  End If
Next I

'Size the results array
ReDim vRes(1 To UBound(vData), 1 To 1)
vRes(1, 1) = "Count"
For I = 2 To UBound(vData, 1)
    If Not vData(I, 1) = "" Then _
        vRes(I, 1) = dID(CStr(vData(I, 1))).Colors.Count
Next I

'The results can  be written anyplace
With rData.Offset(0, 2).Resize(columnsize:=1)
    .EntireColumn.Clear
    .Value = vRes
End With

End Sub


这篇关于Excel - 计算与ID匹配的唯一值,针对100,000+个案例进行了优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆