拆分巨大的文本文件并根据字符作为单独的文本文件写入 [英] split huge text file and write as separate text files based on character

查看:28
本文介绍了拆分巨大的文本文件并根据字符作为单独的文本文件写入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的 .txt 文件,我的网络应用程序每天产生大约 500MB,它有 21 个连续的字段,每个字段中用管道字符分隔 | 并且它有超过 200 万行在里面.对于速度情况,我按大小拆分输入文件,现在需要按我将要添加此新字段的字符分支字段拆分.

I have huge .txt file, it's around 500MB produced daily by my web apps, it has 21 fields in a row, separated by pipe character | in each field and it has more 2 million rows in it. for speed case, I split the input file by its size and now need to be split by character branch field that i'm about to add this new field.

'previous header
    Date|Field_2|Field_3|Field_4|Field_5|Field_6|Field_7|Field_8|Field_9|Field_10|Field_11|Field_12|Field_13|Field_14|Field_15|Field_16|Field_17|Field_18|Field_19|Field_20|

'after add branch field
Date|Branch|Field_2|Field_3|Field_4|Field_5|Field_6|Field_7|Field_8|Field_9|Field_10|Field_11|Field_12|Field_13|Field_14|Field_15|Field_16|Field_17|Field_18|Field_19|Field_20|


'i used to split use this code:
'got the script from http://prabhuram.com/articles/2012/02/28/splitting-large-files-using-vbscript/

Dim  Counter
Const InputFile = "C:\input.txt"
Const OutputFile = "C:\output"
Const RecordSize = 1000000 
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile (InputFile, ForReading)
Counter = 0
FileCounter = 0
Set objOutTextFile = Nothing

Do Until objTextFile.AtEndOfStream
    if Counter = 0 Or Counter = RecordSize Then
        Counter = 0
        FileCounter = FileCounter + 1
    if Not objOutTextFile is Nothing then objOutTextFile.Close        
    Set objOutTextFile = objFSO.OpenTextFile( OutputFile & "_" & FileCounter & ".txt", ForWriting, True)
    end if
    strNextLine = objTextFile.Readline
    objOutTextFile.WriteLine(strNextLine)
    Counter = Counter + 1
Loop
objTextFile.Close
objOutTextFile.Close
Msgbox "Done..."

代码通过拆分每个 RecordSize = 1000000 行来 100% 工作,现在我想通过添加新字段(分支)来按分支报告并根据分支代码将大文件拆分为单独的输出文件(示例分支代码:AAA、BBB、CCC、DDD 等).输入文件已经按分支排序,所以脚本中不需要更多的排序/排序过程.

the code works 100% by splitting every RecordSize = 1000000 rows, now I want some improvement by adding new field (Branch) for report by branch and split the huge file into separate output file based on Branch code (example branch code: AAA, BBB, CCC, DDD etc). the input file already sorted by branch, so no need more sort/order by procedure in the script.

一个巨大的 .txt 文件 --> 基于分支代码的单独 .txt 文件,输出文件将是它自己的分支代码.(对于exp.:AAA.txt 等等..).

one huge .txt file --> separate .txt file based on branch code and the output file would be the branch code it self. (for exp.: AAA.txt and so on..).

任何想法,我如何使用 VBscript 完成此操作?

Any idea, how can I accomplish this using VBscript?

推荐答案

您需要写入由分支代码标识的多个文件.我可能会使用字典来管理它们,例如像这样:

You need to write to multiple files identified by your branch code. I'd probably use a dictionary for managing them, e.g. like this:

...

Set outFiles = CreateObject("Scripting.Dictionary")

Do Until objTextFile.AtEndOfStream
  line = objTextFile.ReadLine

  branchCode = Split(line, "|")(1)
  If Not outFiles.Exists(branchCode) Then
    outFiles.Add branchCode, fso.OpenTextFile(outputFile _
      & "_" & branchCode & ".txt", ForWriting, True)
  End If

  outFiles(branchCode).WriteLine line
Loop

For Each branchCode In outFiles.Keys
  outFiles(branchCode).Close
Next

...

根据需要调整输出文件的名称.

Adjust the name of the output files as you see fit.

这篇关于拆分巨大的文本文件并根据字符作为单独的文本文件写入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆