将Excel中的RTF格式(带有格式标记)转换为未格式化的文本 [英] Rich text format (with formatting tags) in Excel to unformatted text

查看:184
本文介绍了将Excel中的RTF格式(带有格式标记)转换为未格式化的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我大约有. Excel中有12000个包含RTF(包括格式标记)的单元格.我需要解析它们以获取未格式化的文本.

I have approx. 12000 cells in excel containing RTF (including formatting tags). I need to parse them to get to the unformatted text.

这是带有文本的单元格之一的示例:

This is the example of one of the cells with text:

{\rtf1\ansi\deflang1060\ftnbj\uc1
{\fonttbl{\f0 \froman \fcharset0 Times New Roman;}{\f1 \fswiss \fcharset238
Arial;}}
{\colortbl ;\red255\green255\blue255 ;\red0\green0\blue0 ;}
{\stylesheet{\fs24\cf2\cb1 Normal;}{\cs1\cf2\cb1 Default Paragraph Font;}}
\paperw11908\paperh16833\margl1800\margr1800\margt1440\margb1440\headery720\footery720
\deftab720\formshade\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot
\sectd\pgwsxn11908\pghsxn16833\marglsxn1800\margrsxn1800\margtsxn1440\margbsxn1440
\headery720\footery720\sbkpage\pgncont\pgndec
\plain\plain\f1\fs24\pard TPR 0160 000\par IPR 0160 000\par OB-R-02-28\par}

我真正需要的就是这个:

And all I really need is this:

TPR 0160 000
IPR 0160 000
OB-R-02-28

简单遍历单元格并删除不必要的格式的问题是,不是那些12000单元格中的所有内容都像这样简单.因此,我需要手动检查许多不同的版本并编写一些变体.到最后仍然需要做很多手工工作.

The problem with simple looping over the cells and removing unnecessary formatting is, that not everything in those 12000 cells is as straightforward as this is. So I would need to manually inspect many different versions and write several variations; and still at the end there would be a lot of manual work to do.

但是,如果我将一个单元格的内容复制到空文本文档中并另存为RTF,然后用MS Word打开它,它会立即解析文本,我得到的正是我想要的.不幸的是,对于12000个单元格来说这样做非常不方便.

But if I copy the contents of one cell to empty text document and save it as RTF, then open it with MS Word, it instantly parses the text and I get exactly what I want. Unfortunately it's extremely inconvenient to do so for a 12000 cells.

所以我在考虑VBA宏,将单元格内容移至Word,强制进行解析,然后将结果复制回原始单元格.不幸的是,我不确定该怎么做.

So I was thinking about VBA macro, to move cell contents to Word, force parsing and then copy the result back to the originating cell. Unfortunately I'm not really sure how to do it.

有人有什么主意吗?还是其他方法?我将非常感谢您提供解决方案或朝着正确的方向前进.

Does anybody has any idea? Or a different approach? I will be really grateful for a solution or a push in the right direction.

TNX!

推荐答案

如果您确实想遵循使用Word解析文本的方法,此功能将为您提供帮助.如评论所建议,您将需要对MS Word对象库的引用.

If you did want to go down the route of using Word to parse the text, this function should help you out. As the comments suggest, you'll need a reference to the MS Word Object Library.

Function ParseRTF(strRTF As String) As String
Dim wdDoc As Word.Document 'Ref: Microsoft Word 11.0 Object Library'
Dim f     As Integer       'Variable to store the file I/O number'

'File path for a temporary .rtf file'
Const strFileTemp = "C:\TempFile_ParseRTF.rtf"

'Obtain the next valid file I/O number'
f = FreeFile

'Open the temp file and save the RTF string in it'
Open strFileTemp For Output As #f
    Print #f, strRTF
Close #f

'Open the .rtf file as a Word.Document'
Set wdDoc = GetObject(strFileTemp)

'Read the now parsed text from the Word.Document'
ParseRTF = wdDoc.Range.Text

'Delete the temporary .rtf file'
Kill strFileTemp

'Close the Word connection'
wdDoc.Close False
Set wdDoc = Nothing
End Function

您可以使用类似于以下内容的方法为12,000个单元格中的每个单元格调用它:

You could call it for each of your 12,000 cells using something similar to this:

Sub ParseAllRange()
Dim rngCell As Range
Dim strRTF  As String

For Each rngCell In Range("A1:A12000")

    'Parse the cell contents'
    strRTF = ParseRTF(CStr(rngCell))

    'Output to the cell one column over'
    rngCell.Offset(0, 1) = strRTF
Next
End Sub

ParseRTF函数大约需要一秒钟才能运行(至少在我的机器上),因此对于12,000个单元格来说,这大约需要三个半小时才能解决.

The ParseRTF function takes about a second to run (on my machine at least), so for 12,000 cells this will work out at about three and a half hours.

在周末考虑了这个问题后,我确定有一个更好的(快速)解决方案.

Having thought about this problem over the weekend, I was sure there was a better (quicker) solution for this.

我记得剪贴板的RTF功能,意识到实现了一个可以将RTF数据复制到剪贴板,粘贴到word doc并输出结果纯文本的类.这种解决方案的好处在于,不必为每个rtf字符串打开和关闭word doc对象.它可以在循环之前打开,然后在循环之后关闭.

I remembered the RTF capabilities of the clipboard, and realised that a class could be created that would copy RTF data to the clipboard, paste to a word doc, and output the resulting plain text. The benefit of this solution is that the word doc object would not have to be opened and closed for each rtf string; it could be opened before the loop and closed after.

下面是实现此目的的代码.这是一个名为clsRTFParser的Class模块.

Below is the code to achieve this. It is a Class module named clsRTFParser.

Private Declare Function GlobalAlloc Lib "kernel32" _
                (ByVal wFlags&, ByVal dwBytes As Long) As Long
Private Declare Function GlobalLock Lib "kernel32" _
                (ByVal hMem As Long) As Long
Private Declare Function GlobalUnlock Lib "kernel32" _
                (ByVal hMem As Long) As Long
Private Declare Function lstrcpy Lib "kernel32" _
                (ByVal lpString1 As Any, ByVal lpString2 As Any) As Long

Private Declare Function OpenClipboard Lib "user32" _
                (ByVal Hwnd As Long) As Long
Private Declare Function EmptyClipboard Lib "user32" () As Long
Private Declare Function RegisterClipboardFormat Lib "user32" Alias _
                "RegisterClipboardFormatA" (ByVal lpString As String) As Long
Private Declare Function SetClipboardData Lib "user32" _
                (ByVal wFormat As Long, ByVal hMem As Long) As Long
Private Declare Function CloseClipboard Lib "user32" () As Long

'---'

Dim wdDoc As Word.Document 'Ref: Microsoft Word 11.0 Object Library'

Private Sub Class_Initialize()
Set wdDoc = New Word.Document
End Sub

Private Sub Class_Terminate()
wdDoc.Close False
Set wdDoc = Nothing
End Sub

'---'

Private Function CopyRTF(strCopyString As String) As Boolean
Dim hGlobalMemory  As Long
Dim lpGlobalMemory As Long
Dim hClipMemory    As Long
Dim lngFormatRTF   As Long

'Allocate and copy string to memory'
hGlobalMemory = GlobalAlloc(&H42, Len(strCopyString) + 1)
lpGlobalMemory = GlobalLock(hGlobalMemory)
lpGlobalMemory = lstrcpy(lpGlobalMemory, strCopyString)

'Unlock the memory and then copy to the clipboard'
If GlobalUnlock(hGlobalMemory) = 0 Then
    If OpenClipboard(0&) <> 0 Then
        Call EmptyClipboard

        'Save the data as Rich Text Format'
        lngFormatRTF = RegisterClipboardFormat("Rich Text Format")
        hClipMemory = SetClipboardData(lngFormatRTF, hGlobalMemory)

        CopyRTF = CBool(CloseClipboard)
    End If
End If
End Function

'---'

Private Function PasteRTF() As String
Dim strOutput As String

'Paste the clipboard data to the wdDoc and read the plain text result'
wdDoc.Range.Paste
strOutput = wdDoc.Range.Text

'Get rid of the new lines at the beginning and end of the document'
strOutput = Left(strOutput, Len(strOutput) - 2)
strOutput = Right(strOutput, Len(strOutput) - 2)

PasteRTF = strOutput
End Function

'---'

Public Function ParseRTF(strRTF As String) As String
If CopyRTF(strRTF) Then
    ParseRTF = PasteRTF
Else
    ParseRTF = "Error in copying to clipboard"
End If
End Function

您可以使用类似于以下内容的方法为12,000个单元格中的每个单元格调用它:

You could call it for each of your 12,000 cells using something similar to this:

Sub CopyParseAllRange()
Dim rngCell As Range
Dim strRTF  As String

'Create new instance of clsRTFParser'
Dim RTFParser As clsRTFParser
Set RTFParser = New clsRTFParser

For Each rngCell In Range("A1:A12000")

    'Parse the cell contents'
    strRTF = RTFParser.ParseRTF(CStr(rngCell))

    'Output to the cell one column over'
    rngCell.Offset(0, 1) = strRTF
Next
End Sub

我已经在计算机上使用示例RTF字符串对此进行了模拟.对于12,000个电池,它花了两分半钟,这是一个更合理的时间范围!

I have simulated this using example RTF strings on my machine. For 12,000 cells it took two and a half minutes, a much more reasonable time frame!

这篇关于将Excel中的RTF格式(带有格式标记)转换为未格式化的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆