使用vba从汉字的文本文件中提取文本 [英] Extract text from a text file with Chinese characters using vba

查看:553
本文介绍了使用vba从汉字的文本文件中提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一批十万个文本文件,我想使用vba提取为字符串。
过去我一直这样做没有问题:

  Sub Main()
Dim PathAndName As String
Dim TextFile As Integer
Dim TextString()As String
Redim TextString(100000)
对于i = 1至100,000
PathAndName =C:\\ \\ File_&我& .ext
TextFile = 1
打开PathAndName作为TextFile输入
TextString(i)= Input(LOF(TextFile),TextFile)
下一个i
结束Sub

这一次,脚本返回错误输入结束文件错误62.
我唯一可以看到的不同之处在于,这一次文本文件包含几个汉字,这实际上不是我的兴趣。这就是为什么我相信这是问题的根源。
中文字符出现在文件的第一行。



任何帮助都不胜感激。谢谢!

解决方案

我怀疑你的文本文件现在是一个多字节编码。一个字符被编码为两个或三个字节。所以 LOF(TextFile)不会返回正确的字符数,而是字节计数。但是 Input(LOF(TextFile),TextFile)需要字符计数,因为它必须创建一个 String 。 >

您可以使用:

  Sub Main()
Dim PathAndName As String
Dim TextFile As Integer
Dim TextString()As String
Redim TextString(100000)
对于i = 1至100000
PathAndName =C:\ File_&我& .ext
TextFile = 1
打开PathAndName作为TextFile输入

Dim sLine As String
Dim sTextString As String
sLine =
sTextString =

尽管不是EOF(TextFile)
输入#TextFile,sLine
sTextString = sTextString& sLine
循环

TextString(i)= sTextString

关闭#TextFile

下一个i
End Sub

但是更好的方法是使用 ADODB.Stream 的恐龙VB文件访问方法。但这是一个完全不同的做法。所以你应该首先阅读 ADODB.Stream


I have a batch of like 100,000 text files which I would like to extract as strings using vba. In the past I have been doing so this way without problem:

Sub Main()
Dim PathAndName As String
Dim TextFile As Integer
Dim TextString() As String
Redim TextString(100000)
For i = 1 To 100,000
    PathAndName = "C:\File_" & i & ".ext"
    TextFile = 1
    Open PathAndName For Input As TextFile
    TextString(i) = Input(LOF(TextFile), TextFile)
Next i
End Sub

This time, the script returns the error "Input Past End of File" Error 62. The only different I can spot is that this time the text files contain a few Chinese Characters, which are not of my interest actually. That's why I believe this is the source of the problem. The Chinese Characters appear at the first line of the files.

Any help is appreciated. Thanks!

解决方案

I suspect your text file is in a multibyte encoding now. There one character is encoded in two or three bytes. So LOF(TextFile) will not return the correct character count but the byte count. But Input(LOF(TextFile), TextFile) needs the character count since it must create a String.

You could use:

Sub Main()
Dim PathAndName As String
Dim TextFile As Integer
Dim TextString() As String
Redim TextString(100000)
For i = 1 To 100000
    PathAndName = "C:\File_" & i & ".ext"
    TextFile = 1
    Open PathAndName For Input As TextFile

    Dim sLine As String
    Dim sTextString As String
    sLine = ""
    sTextString = ""

    Do While Not EOF(TextFile)
     Input #TextFile, sLine
     sTextString = sTextString & sLine
    Loop

    TextString(i) = sTextString

    Close #TextFile

Next i
End Sub

But the better approach would be using ADODB.Stream instead of the dinosaur VB file access methods. But this is a totally different approach. So you should read about ADODB.Stream yourself first.

这篇关于使用vba从汉字的文本文件中提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆