大型数据集的唯一计数公式 [英] Unique Count Formula for large dataset

查看:87
本文介绍了大型数据集的唯一计数公式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在确定将 1 0 输入相邻单元格以指示在处理大型数据集时值是否唯一的方法时遇到了麻烦.我已经阅读了多种方法来实现此目的,但是对于我来说,这些方法似乎都不有效:我使用的是Excel 2010实例(因此,我没有拥有不同的计数数据透视表中的功能,当我尝试使用PowerPivot时,由于处理限制,它会使我的文件崩溃.

I am having trouble determining a way to enter a 1 or 0 into an adjacent cell to indicate whether or not a value is unique when working with a large dataset. I have read of multiple methods for accomplishing this, however none of them seem efficient for my purposes: I am using an instance of Excel 2010 (so I do not have the Distinct Count feature in PivotTables, and when I try to use PowerPivot it crashes my file due to processing limitations.

在此StackOverflow问题中:要计算唯一计数的简单数据透视表值有一些建议使用 SUMPRODUCT COUNTIF ,但是当按原样使用50,000+行时,这将导致糟糕的性能,文件大小约为35MB,而不是〜3 MB.我想知道对于大型动态数据集,不管是公式还是VBA,是否有更好的解决方案.

In this StackOverflow question: Simple Pivot Table to Count Unique Values there are suggestions to use SUMPRODUCT or COUNTIF, but when working with 50,000+ rows as I am, this causes terrible performance and a file size of ~35 MB instead of ~3 MB. I wanted to know if there is a better solution for a large, dynamic dataset whether it is a formula or VBA.

我要完成的一个示例是(以 Unique 列为相邻单元格):

An example of what I would like to accomplish is (with the Unique column being the adjacent cell):

Name   Week   Unique
John   1      1
Sally  1      1
John   1      0
Sally  2      1

我试图编写 COUNTIF 的相同功能,但没有成功:

I attempted to script the same functionality of COUNTIF but with no success:

For Each Cell In ThisWorkbook.Worksheets("Overtime & Type Data").Range("Z2:Z" & DataLastRow)
If Worksheets("Overtime & Type Data").Cells(Cell.Row, 26) <> Worksheets("Overtime & Type Data").Cells(Cell.Row - 1, 26) Then
FirstCell = Cell.Row
End If
If (Worksheets("Overtime & Type Data").Range(Cells(FirstCell, 26), Cells(Cell.Row, 26)) = Worksheets("Overtime & Type Data").Range(Cells(Cell.Row, 26))) = True Then
    Cell.Value = 1
Else
    Cell.Value = 0
End If
Next Cell

推荐答案

此代码在不到3秒的时间内成功运行了130,000多行.调整列字母以适合您的数据集.

This code ran on over 130,000 rows successfully in less than 3 seconds. Adjust the column letters to fit your dataset.

Sub tgr()

    Const colName As String = "A"
    Const colWeek As String = "B"
    Const colOutput As String = "C"

    Dim ws As Worksheet
    Dim rngData As Range
    Dim DataCell As Range
    Dim rngFound As Range
    Dim collUniques As Collection
    Dim arrResults() As Long
    Dim ResultIndex As Long
    Dim UnqCount As Long

    Set ws = ThisWorkbook.Sheets("Overtime & Type Data")
    Set rngData = ws.Range(colName & 2, ws.Cells(Rows.Count, colName).End(xlUp))
    Set collUniques = New Collection
    ReDim arrResults(1 To rngData.Cells.Count, 1 To 1)

    On Error Resume Next
    For Each DataCell In rngData.Cells
        ResultIndex = ResultIndex + 1
        collUniques.Add ws.Cells(DataCell.Row, colName).Value & ws.Cells(DataCell.Row, colWeek).Value, ws.Cells(DataCell.Row, colName).Value & ws.Cells(DataCell.Row, colWeek).Value
        If collUniques.Count > UnqCount Then
            UnqCount = collUniques.Count
            arrResults(ResultIndex, 1) = 1
        Else
            arrResults(ResultIndex, 1) = 0
        End If
    Next DataCell
    On Error GoTo 0

    ws.Cells(rngData.Row, colOutput).Resize(rngData.Cells.Count).Value = arrResults

End Sub

这篇关于大型数据集的唯一计数公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆