转换数组字母数字的每个成员的最快方法是什么? [英] What is the fastest way to turn every member of an array alphanumeric?

查看:91
本文介绍了转换数组字母数字的每个成员的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最终结果:

我想知道如果字符串更长,下面的结果是否会改变.我在同一台计算机上进行了完全相同的测试,除了每个单元格的随机字符串是34个字符而不是四个字符.这些是结果:

I was wondering whether the results below changed if the string was longer. I ran exactly the same tests on the same computer, except each cell had a random string of 34 characters rather than four. These were the results:

Comintern (Regexp):       136.1  ms  
brettdj (Regexp):         139.9  ms  
Slai (Regexp):            158.4  ms  
*Original Regex:          161.0  ms*    
Comintern (AN):           170.1  ms  
Comintern (Hash):         183.6  ms  
ThunderFrame:             232.9  ms    
*Original replace:        372.9  ms*  
*Original InStr:          478.1  ms*  
CallumDA33:              1218.1 ms

这确实表明了Regex的速度-所有使用Regex.replace的解决方案都显着提高了速度,其中最好的就是Comintern的实现.

This really shows the speed of the Regex - all the solutions utilising Regex.replace are significantly faster, with the best being Comintern's implementation.

总而言之,如果字符串较长,请使用数组;如果字符串较短,请使用剪贴板.如果不确定,最佳结果是使用数组,但这可能会牺牲短字符串的性能.

In summary, if the strings are long, use arrays, if they are short, use the clipboard. If unsure, the optimal result is to use arrays, but this may sacrifice a little performance on short strings.

最终结果:

非常感谢您的所有建议,很明显,我还有很多东西要学习.昨天我一直在考虑这件事,所以我决定在家里重新运行所有内容.这是根据将每个字符串应用于30,000个四个字符串得出的最终结果.

Thanks very much for all of your suggestions, clearly I still have a lot to learn. I was thinking about this all yesterday, so I decided to rerun everything at home. Here are the final results, based on applying each of these to 30,000 four character strings.

我家里的计算机是Intel i7 @ 3.6 GHz,8GB RAM,64位Windows 10和Excel2016.与以前相似的条件是我在后台运行进程,但是整个过程中我都没有积极地做任何事情测试.

My computer at home is an Intel i7 @ 3.6 GHz, 8GB RAM, 64-bit Windows 10 and Excel 2016. Similar conditions to before in that I have processes running in the background, but I'm not actively doing anything throughout the tests.

Original replace:  97.67  ms
Original InStr:    106.54 ms
Original Regex:    113.46 ms
ThunderFrame:      82.21  ms
Comintern (AN):    96.98  ms
Comintern (OR):    81.87  ms
Comintern (Hash):  101.18 ms
brettdj:           81.66  ms
CallumDA33:        201.64 ms
Slai:              68.38  ms

因此,我接受了Slai的回答,因为它显然是常规实施中最快的方法,但是我将根据实际数据重新运行它们,以检查其是否仍然有效.

I've therefore accepted Slai's answer as it is clearly the fastest for general implementation, but I'll rerun them all at work against the actual data to check this still works.

原始帖子:

我在Excel中有一个数组,该数组是零件编号的列表.例如,我需要将数组的每个成员都转换为字母数字

I have an array in Excel that is a list of part numbers. I need to turn every member of the array alphanumeric, for example

ABC123-001 -> ABC123001
ABC123/001 -> ABC123001
ABC123001  -> ABC123001

最快的方法是什么?

对于上下文,我们的零件号可以采用不同的形式,因此我正在编写一个函数,该函数在给定范围内找到最佳匹配.此刻,使所有字母数字内容运行的功能部分大约需要50毫秒才能运行,而该功能的其余部分总共需要30毫秒.我也无法避免使用Excel.

For context, our part numbers can come in different forms, so I'm writing a function that finds the best match within a given range. At the moment, the part of the function that makes everything alphanumeric takes about 50ms to run, whereas the rest of the function takes around 30ms in total. I also can't avoid using Excel.

我自己做了一些工作(请参阅下面的答案),但是主要问题是我必须一个接一个地遍历数组的每个元素-会有更好的方法吗?我也从未进行过测试,因此,对改进它们的任何反馈将不胜感激.

I've done some work myself (see answer below), but the main issue is that I have to loop through every element of the array one-by-one - could there be a better way? I've also never run tests before, so any feedback on improving them would be much appreciated.

这是到目前为止我尝试过的.

我正在使用

I'm using MicroTimer and my computer has an Intel i5 @2.5GHz, 4GB of RAM, 64-bit Windows 7. I've got processes running in the background, but I'm not actively doing anything else whilst these are run.

我使用以下代码创建了30,000行随机符号:

I created 30,000 lines of random symbols using this code:

=CHAR(RANDBETWEEN(1,60))&CHAR(RANDBETWEEN(48,57))&CHAR(RANDBETWEEN(37,140))&CHAR(RANDBETWEEN(37,140))

(请注意,由于'='为char(61),我们如何将第一个字符停在60,并且我们要避免Excel将其解释为公式.另外,我们也强制第二个字符为数字,因此我们可以保证至少一个里面有字母数字字符.)

(note how we stop the first character at 60 because '=' is char(61) and we want to avoid Excel interpreting this as a formula. Also we force the second character to be a number so we can guarantee at least one alphanumeric character in there.)

1.使用基于案例的循环.平均时间:175毫秒

使用

Using the function in this post, we load the range into an array, apply the function to each element of the array and paste it back. Code:

Function AlphaNumericOnly(strSource As Variant) As String
    Dim i As Integer
    Dim strResult As String

    For i = 1 To Len(strSource)
        Select Case Asc(Mid(strSource, i, 1))
            Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space
                strResult = strResult & Mid(strSource, i, 1)
        End Select
    Next
    AlphaNumericOnly = strResult
End Function

Sub Replace()

    Dim inputSh As Worksheet
    Dim inputRng As Range
        Set inputSh = Sheets("Data")
        Set inputRng = inputSh.Range("A1:A30000")

    Dim outputSh As Worksheet
    Dim outputRng As Range
        Set outputSh = Sheets("Replace")
        Set outputRng = outputSh.Range("A1:A30000")

    Dim time1 As Double, time2 As Double
        time1 = MicroTimer

    Dim arr As Variant
        arr = inputRng

    Dim i As Integer
        For i = LBound(arr) To UBound(arr)
            arr(i, 1) = AlphaNumericOnly(arr(i, 1))
        Next i

    outputRng = arr

    time2 = MicroTimer

    Debug.Print (time2 - time1) * 1000

End Sub

2.使用InStr()检查每个字符.平均时间:201毫秒

定义一串有效值.如果有效值出现在数组元素中,请一一检查:

Define a string of valid values. Check one-by-one if the valid values appear in the array elements:

Sub InStr()

    Dim inputSh As Worksheet
    Dim inputRng As Range
        Set inputSh = Sheets("Data")
        Set inputRng = inputSh.Range("A1:A30000")

    Dim outputSh As Worksheet
    Dim outputRng As Range
        Set outputSh = Sheets("InStr")
        Set outputRng = outputSh.Range("A1:A30000")

    Dim time1 As Double, time2 As Double
        time1 = MicroTimer

    Dim arr As Variant
        arr = inputRng

    Dim validValues As String
        validValues = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 'put numbers and capitals at the start as they are more likely'

    Dim i As Integer, j As Integer
    Dim result As String

        For i = LBound(arr) To UBound(arr)
        result = vbNullString
            For j = 1 To Len(arr(i, 1))
                If InStr(validValues, Mid(arr(i, 1), j, 1)) <> 0 Then
                    result = result & Mid(arr(i, 1), j, 1)
                End If
            Next j
        arr(i, 1) = result
        Next i

    outputRng = arr

    time2 = MicroTimer

    Debug.Print (time2 - time1) * 1000

End Sub

3.在数组上使用regex.Replace.时间:171毫秒

定义一个正则表达式,并使用它替换数组中的每个元素.

Define a regex and use this to replace each element of the array.

Sub Regex()

    Dim inputSh As Worksheet
    Dim inputRng As Range
        Set inputSh = Sheets("Data")
        Set inputRng = inputSh.Range("A1:A30000")

    Dim outputSh As Worksheet
    Dim outputRng As Range
        Set outputSh = Sheets("Regex")
        Set outputRng = outputSh.Range("A1:A30000")

    Dim time1 As Double, time2 As Double
        time1 = MicroTimer

    Dim arr As Variant
        arr = inputRng

    Dim objRegex As Object
        Set objRegex = CreateObject("vbscript.regexp")
        With objRegex
            .Global = True
            .ignorecase = True
            .Pattern = "[^\w]"
        End With

    Dim i As Integer
        For i = LBound(arr) To UBound(arr)
            arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString)
        Next i

    outputRng = arr

    time2 = MicroTimer

    Debug.Print (time2 - time1) * 1000

End Sub


@ThunderFrame-我们的零件号通常采用以下格式:

@ThunderFrame - our part numbers generally come in the following formats:

  • 所有数字(例如32523452)
  • 字母和数字的混合字符(例如AB324K234或123H45645)
  • 由非字母数字字符(例如ABC001-001,ABC001/001、123/4557-121)链接的字母和数字的组合

在开始替换之前,我曾考虑过在每个字符串上使用regex.test,但是我不确定这是否只是复制字符串然后对其进行测试,在这种情况下,我也可以只替换为开始.

I have thought about using regex.test on each string before launching into the replacement, but I'm not sure if this will just copy the string to then test it, in which case I may as well just make the replacement to start with.

@Slai-感谢您的链接-我将对此进行更详细的研究

@Slai - thanks for the link - I will look into that in more detail

推荐答案

不确定这是否会更快,因为它取决于太多因素,但可能值得测试.您可以从剪贴板中获取复制的Range文本并立即替换所有值,而不必分别替换Regex.请注意,\w也会与下划线和Unicode字母匹配,因此在正则表达式中更加具体可以使其速度更快.

Not sure if this would be faster because it depends on too many factors, but might be worth testing. Instead of Regex.Replace each value separately, you can get the copied Range text from the clipboard and replace all values at once. Note that \w matches underscore and Unicode letters too, so being more specific in the regular expression can make it faster.

'[a1:b30000] = [{"ABC123-009",""}]: Dim t As Double: t = Timer ' used for testing

Dim r As Range, s As String
Set r = ThisWorkbook.Worksheets("Data").UsedRange.Resize(, 1) ' Data!A1:A30000
With New MSForms.DataObject ' needs reference to "Microsoft Forms 2.0 Object Library" or use a bit slower late binding - With CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
   r.Copy
   .GetFromClipboard
    Application.CutCopyMode = False
    s = .GetText
    .Clear ' optional - clear the clipboard if using Range.PasteSpecial instead of Worksheet.PasteSpecial "Text"

    With New RegExp ' needs reference to "Microsoft VBScript Regular Expressions 5.5" or use a bit slower late binding - With CreateObject("VBScript.RegExp")
        .Global = True
        '.IgnoreCase = False ' .IgnoreCase is False by default
        .Pattern = "[^0-9A-Za-z\r\n]+" ' because "[^\w\r\n]+" also matches _ and Unicode letters
        s = .Replace(s, vbNullString)
    End With

    .SetText s
    .PutInClipboard
End With

' about 70% of the time is spent here in pasting the data 
r(, 2).PasteSpecial 'xlPasteValues ' paste the text from clipboard in B1

'Debug.Print Timer - t

由于剪贴板的开销,我希望这对于较小的值会更慢,而由于所需的内存,对于更多的值可能会更慢.

I expect this to be slower for less values because of the clipboard overhead, and maybe slower for a lot more values because of the memory needed.

禁用事件似乎对我的测试没有影响,但可能值得尝试.

Disabling events didn't seem to make difference in my tests, but might be worth trying.

请注意,在宏正在使用剪贴板时,其他应用程序使用剪贴板的可能性很小.

Note that there is a tiny chance of another application using the clipboard while the macro is using it.

如果早期绑定由于在不同的计算机上运行相同的已编译宏而导致问题,则可以搜索宏反编译器或删除引用并切换到后期绑定.

If early binding causes issues from running the same compiled macro on different machines, you can search for macro decompiler or remove the references and switch to late binding.

这篇关于转换数组字母数字的每个成员的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆