希望 Excel 中的 VBA 读取非常大的 CSV 并创建 CSV 的一小部分的输出文件 [英] Want VBA in excel to read very large CSV and create output file of a small subset of the CSV

查看:25
本文介绍了希望 Excel 中的 VBA 读取非常大的 CSV 并创建 CSV 的一小部分的输出文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 120 万条文本记录的 csv 文件.字母数字字段用引号括起来,日期/时间或数字字段不是.

I have a csv file of 1.2 million records of text. The alphanumeric fields are wrapped in quotation marks, the date/time or numeric fields are not.

例如"Fred","Smith",01/07/1967,2,"7, The High Street","Anytown","Anycounty","LS1 7AA"

For example "Fred","Smith",01/07/1967,2,"7, The High Street","Anytown","Anycounty","LS1 7AA"

我想要做的是在 Excel 中编写一些 VBA(或多或少是我可以合理熟练使用的唯一可用工具),它逐条记录读取 CSV 记录,执行检查(就像它发生在最后一个字段,邮政编码),然后将 1.2m 记录的一小部分输出到新的输出文件.

What I want do is write some VBA in Excel (more or less the only tool available to me that I am reasonably proficient in the use of) that reads the CSV record by record, performs a check (as it happens on the last field, the post code) and then outputs a small subset of the 1.2m records to a new output file.

我了解如何打开两个文件,读取记录,对数据做我需要做的事情并将其写出(我只会输出带有表示异常类型的前缀的输入记录)

I understand how to open the two files, read the record, do what I need to do with the data and write it out (I will just output the input record with a prefix denoting an exception type)

我不知道如何正确解析 VBA 中的 CSV.我无法进行简单的文本扫描和搜索逗号,因为文本有时会包含逗号(因此为什么文本字段是文本分隔的)

What I don't know is how to parse the CSV in VBA properly. I can't do a simple text scan and search for commas as the text sometimes has commas in (hence why the text fields are text delimited)

是否有一个很棒的命令可以让我快速获取记录中第 n 个字段的数据?

Is there a fantastic command that would let me quicky get the data from the nth field in my record?

我想要的是s_work = field(s_input_record,5) 其中 5 是我的 CSV 中的字段编号....

What I want is s_work = field(s_input_record,5) where 5 is the field number in my CSV....

非常感谢,

推荐答案

以下代码应该可以解决问题.我面前没有 Excel,所以我还没有测试过,但这个概念是合理的.

The following code should do the trick. I don't have Excel in front of me, so I haven't tested it, but the concept is sound.

如果最后还是太慢了,我们可以想办法提高效率.

If this ends up being too slow, we can look at ways to improve the efficiency.

Sub SelectSomeRecords()
    Dim testLine As String

    Open inputFileName For Input As #1
    Open outputFileName For Output As #2

    While Not EOF(1)
        Line Input #1, testLine
        If RecordIsInteresting(testLine) Then
            Print #2, testLine
        End If
    Wend

    Close #1
    Close #2
End Sub

Function RecordIsInteresting(recordLine As String) As Boolean
    Dim lineItems(1 to 8) As String

    GetRecordItems(lineItems(), recordLine)

    ''// do your custom checking here:
    RecordIsInteresting = lineItems(8) = "LS1 7AA"
End Function

Sub GetRecordItems(items() As String, recordLine as String)
    Dim finishString as Boolean
    Dim itemString as String
    Dim itemIndex as Integer
    Dim charIndex as Long
    Dim inQuote as Boolean
    Dim testChar as String

    inQuote = False
    charIndex = 1
    itemIndex = 1
    itemString = ""
    finishString = False

    While charIndex <= Len(recordLine)
        testChar = Mid$(recordLine, charIndex, 1)

        finishString = False

        If inQuote Then
            If testChar = Chr$(34) Then
                inQuote = False
                finishString = True
                charIndex = charIndex + 1 ''// ignore the next comma
            Else
                itemString = itemString + testChar
            End If
        Else
            If testChar = Chr$(34) Then
                inQuote = True
            ElseIf testChar = "," Then
                finishString = True
            Else
                itemString = itemString + testChar
            End If
        End If

        If finishString Then
            items(itemIndex) = itemString
            itemString = ""
            itemIndex = itemIndex + 1
        End If

        charIndex = charIndex + 1
    Wend
End Sub

这篇关于希望 Excel 中的 VBA 读取非常大的 CSV 并创建 CSV 的一小部分的输出文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆