在文本文件中查找损坏的行并将它们写在上面的行后面 [英] Find corrupt lines in textfiles and write them behind the line above

查看:37
本文介绍了在文本文件中查找损坏的行并将它们写在上面的行后面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大约 400 个文本文件,其中大约有 41000 行损坏.

I have around 400 textfiles with circa 41000 corrupt lines.

我正在寻找一个选项(可能是 VBA?),它搜索这些损坏的行并基本上执行退格,以便将损坏的行写在前面的行后面,因为损坏是由不需要的自动换行引起的.损坏行的指标是它们不是以字母 TEQ 开头.

I am searching for an option (VBA maybe?) which searches for these corrupt lines and basically executes a backspace, so that the corrupt lines are written behind the line before, because the corruption is caused by an unwanted wordwrap. The indicator for corrupt lines is that they don't start with the letters TEQ.

有谁知道如何以及在哪里构建这样的脚本?搜索和替换不起作用,因为我只能在替换字段中使用退格键.提前致谢!

Has anyone any idea how and where to build a script like that? Search and replace does not work since i cant but a backspace in the replace field obviously. Thanks in advance!

损坏行的示例:

TEQ;231232;OFNENJD;29840389;TPOS;
TEQ;54111232;O2D;29829;
TPOS;

第 3 行是损坏的,因为它属于第 2 行,但有自动换行.我需要执行一个退格来让它回到第 2 行后面.这就是我想要自动化的.

Line 3 is the corrupted one since it belongs to line 2 but there was a wordwrap. I need to execute a backspace to get it back behind line 2. That's what i'd like to have automated.

推荐答案

要隔离坏的行尾,首先将好的行尾转换为摘要.然后,您可以删除 vbCrLF 或 vbLf,这将具有将它们退格的效果.最后一步是通过反转摘要来恢复良好的行尾.

To isolate the bad end-of-lines, first convert the good end-of-lines to an abstract. You can then remove the vbCrLF or vbLf which will have the effect of backspacing them away. The last step would be to restore the good end-of-lines by reversing the abstract.

dim str as string
'use your favorite method to read the TXT file into the str variable
str = Replace(str, chr(59) & vbCrLf & "TEQ;", chrw(8203))  'convert good eol to unicode zero-length space
str = Replace(str, vbLf, vbNullString)   'remove bad eols
str = Replace(str, chrw(8203), chr(59) & vbCrLf & "TEQ;")  'revert back to good eol
'write the str back to the TXT file

将一些 .TXT 文件放入十六进制编辑器以确定错误的行尾是否是使用 vbCrLf (Chr(13) & Chr(10)) 创建的不是一个坏主意或只是 vbLf (Chr(10)).与好的行尾相同,尽管我怀疑好的是 vbCrLF,坏的只是 vbLf.

It wouldn't be a bad idea to throw a few of the .TXT files into a hex editor to determine whether the bad end-of-lines are created with vbCrLf (Chr(13) & Chr(10)) or just vbLf (Chr(10)). Same with the good end-of-lines although I suspect the good ones will be vbCrLF and the bad ones just vbLf.

以下子过程要求您进入 VBE 的工具 ► 参考并将 Microsoft Scripting Runtime 添加到项目中.

The following Sub procedure requires that you go into the VBE's Tools ► References and add Microsoft Scripting Runtime to the project.

Sub fix_TEQ_text()
    Dim str As String, fp As String, fn As String
    Dim fso As New FileSystemObject, ts As TextStream

    fp = Environ("TEMP")
    fn = Dir(fp & Chr(92) & "TEQ*.txt", vbNormal)

    Do While CBool(Len(fn))
        If Not CBool(InStr(1, fn, "_fixed", vbTextCompare)) Then
            Set ts = fso.OpenTextFile(fp & Chr(92) & fn, ForReading)
            str = ts.ReadAll
            ts.Close

            str = Replace(str, Chr(59) & vbCrLf & "TEQ;", ChrW(8203))  'convert good eol to unicode zero-length space
            str = Replace(str, vbLf, vbNullString)   'remove bad eols
            str = Replace(str, ChrW(8203), Chr(59) & vbCrLf & "TEQ;")  'revert back to good eol

            Set ts = fso.CreateTextFile(fp & Chr(92) & Replace(fn, ".txt", "_fixed.txt"), True)
            ts.Write str
            ts.Close

        End If
        fn = Dir
    Loop
End Sub

您需要更改文件路径(例如 fp)和文件掩码(当前为与我的示例 TXT 文件匹配的 "TEQ*.txt").

You will want to change the file path (e.g. fp) and the file mask (currently "TEQ*.txt" which matched my sample TXT files).

这篇关于在文本文件中查找损坏的行并将它们写在上面的行后面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆