从电子邮件正文中删除包含 CRLF 字符的字符串 [英] Removing a string that includes CRLF characters from body of e-mail

查看:50
本文介绍了从电子邮件正文中删除包含 CRLF 字符的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从选定的传入 MS Outlook (2016) 电子邮件中删除一个字符串.

I am trying to remove a string from selected incoming MS Outlook (2016) e-mail.

该字符串是德语中的两个句子.我使用 Replace() 函数.这主要有效.(请参阅下面的完整程序.)

The string is two sentences in German language. I use the Replace() function. This principally works. (See my full procedure below.)

这两个句子有时会被 CRLF(回车,换行)字符隔开,而且它们并不总是在同一个地方.这似乎是这些电子邮件在进入我的 Outlook 收件箱之前通过各种设备的结果.

The two sentences are sometimes separated by CRLF (Carriage Return, Line Feed) characters, and these are not always at the same place. This seems to be the result of these e-mails passing through various devices before they land in my Outlook inbox.

在解决 CRLF 出现在不同位置的问题之前,我想创建一个程序来处理 CRLF 在固定位置的字符串.

Before addressing the issue of the CRLF appearing on varying places, I want to create a procedure that deals with strings with CRLF at fixed positions.

这样一个字符串的源代码会是什么样子:

How the source code of such a string would look:

(屏幕截图历史:我将电子邮件以 .html 格式保存在我的硬盘上,然后在 Notepad++ 中打开 .html 文件,以查看 CRLF 字符.)

(Screen shot history: I saved the e-mail as .html on my harddisk, then opened the .html file in Notepad++, to see the CRLF characters.)

html 标签与我无关.它们可以保留在电子邮件中.(事实上​​,格式标签也各不相同,所以最好不要开始处理它们.)我唯一关心的是删除可见部分,即文本Diese E-Mail kommt... vertrauenswürdighalten"..

The html tags are not that relevant for me. They can remain in the e-mail. (In fact, the formatting tags vary, too, so it is better to not start tackling them at all.) My only concern is to remove the visible part, i.e. the text "Diese E-Mail kommt... vertrauenswürdig halten".

我试图通过将 CR LF 部分包含为 Chr() 来捕获带有换行符的文本:

I tried to catch text with line breaks by including the CR LF part as Chr():

strDelete01 = "Diese E-Mail kommt von Personen" & Chr(13) & Chr(10) & "außerhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhnge," & Chr(13) & Chr(10) & "wenn Sie die Personenn für vertrauenswürdig halten." 

我的程序无法识别字符串,因此什么也不做.

My procedure does not recognize the string, and accordingly does nothing.

Public Sub EditBodyCgReplace()

'Declarations
   Dim obj As Object
   Dim Sel As Outlook.Selection
   Dim DoSave As Boolean
   Dim NewBody As String
   Dim strDelete01 As String
   Dim strDelete02 As String
   Dim strDelete03 As String
   Dim strDelete04 As String

'Fill the variables 
   strDelete01 = "Diese E-Mail kommt von Personen außerhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhänge, wenn Sie die Personen für vertrauenswürdig halten."
   strDelete02 = "################################################################################"
   strDelete03 = <hr>
   strDelete04 = "Diese E-Mail kommt von Personen" & Chr(13) & Chr(10) & "außerhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhnge," & Chr(13) & Chr(10) & "wenn Sie die Personenn für vertrauenswürdig halten."

'Note: I am playing here with various types of strings at once. For example, 
'the procedure will also remove <hr> lines and "#####" strings  

'Work with it 
    If TypeOf Application.ActiveWindow Is Outlook.Inspector Then
        Set obj = Application.ActiveInspector.CurrentItem
    Else
        Set Sel = Application.ActiveExplorer.Selection
        If Sel.Count Then
            Set obj = Sel(1)
            DoSave = True
        End If
    End If

    If Not obj Is Nothing Then
        NewBody = Replace(obj.HTMLBody, strDelete01, "")
        NewBody = Replace(obj.HTMLBody, strDelete02, "")
        NewBody = Replace(obj.HTMLBody, strDelete03, "")
        NewBody = Replace(obj.HTMLBody, strDelete04, "")

        If NewBody <> "" Then
            obj.HTMLBody = NewBody
            If DoSave Then
                obj.Save
            End If
        End If
    End If
    End Sub

问题:如何将 CRLF 包含在搜索字符串中?

Question: What can I do to include the CRLF in the search string?

后续问题:如何删除不同位置中包含的 CRLF 的此类字符串?有没有办法使用正则表达式?Outlook中的VBA可以处理吗?- 想法:如果正则表达式有效,也许整个 CRLF 问题就不再是问题,因为表达式看起来像

Follow-up question: What can I do to remove such strings with CRLF included in varying places? Is there a way to use regular expressions? Can VBA in Outlook deal with it? - Idea: if regular expressions work, perhaps the entire CRLF issue is not an issue anymore, as the expression would look something like

"Diese E-Mail kommt von * vertrauenswürdig halten."

因此在中间包含任何内容 - 包括 CRLF?

and thus include anything - including CRLF - in the middle?

在进行各种实验后,我开始觉得 MS Outlook 在其电子邮件中根本不使用 HTML?

After doing various experiments I am starting to feel that MS Outlook does not use HTML at all in its e-mails?

我发现我几乎无法处理 obj.HTMLBody 中的任何 html 代码.我可以解决纯文本.我无法解决诸如<hr"之类的 html 部分,或者至少这是我认为正在观察的部分.(曾经有一段时间我可以解决<hr>"并因此删除它,但我无法重新创建昨天有效的条件.)

I observe I can practically not address any html code in the obj.HTMLBody. I can address plain text. I cannot address parts of html such as "<hr ", or at least that is what I believe to be observing. (There was a moment when I could address "<hr>" and thus delete it, but I cannot recreate the conditions where this worked yesterday.)

我可以将电子邮件保存为 html 文件(Outlook 之外,在我硬盘上的某个单独文件夹中),并且在这些文件中我确实看到了 CRLF和其他东西.但也许电子邮件,只要保存在 Outlook 中,就会使用其他代码存储?

I can save the e-mails as html files (outside Outlook, somewhere on my harddisk in a separate folder), and in these files I do see the CRLF and other stuff. But perhaps the e-mails, as long as kept in Outlook itself, are stored using some other code?

那么这段代码是什么,我该如何处理要删除的部分代码?

So what is this code, and how can I address parts of it for deleting?

推荐答案

我有空我会分部分回答你的问题.其他人可能会比我先到重要的一点.

I will answer your question in parts as I have the spare time. Someone else may get to the important bit before I do.

我已经编辑了您的问题.有几句话我没听懂,所以我查看了来源,发现我的怀疑是正确的,您包含的字符数不足.Stack Overflow 允许使用有限数量的 Html 标签.任何看起来像 Html 标签的东西都将被忽略.我替换了每个<"用&lt;"这样读者就可以看到您的 Html.如果您不明白为什么会这样,我可以添加解释.

I have edited your question. I did not understand a couple of sentences so I looked at the source and found my suspicion was correct, you had included less than characters. Stack Overflow permits a limited number of Html tags. Anything else that looks like an Html tag is ignored. I replaced each "<" with "&lt;" so readers could see your Html. I can add an explanation if you do not understand why this works.

你有:

NewBody = Replace(obj.HTMLBody, strDelete01, "")
NewBody = Replace(obj.HTMLBody, strDelete02, "")
NewBody = Replace(obj.HTMLBody, strDelete03, "")
NewBody = Replace(obj.HTMLBody, strDelete04, "")
If NewBody <> "" Then

每个Replace(除了第一个)覆盖由前一个Replace创建的NewBody的值.你似乎认为如果没有找到strDelete04,NewBody 就会为空.不,如果未找到 strDelete04NewBody 将是 obj.HTMLBody 的副本.

Each Replace (except the first) overwrites the value of NewBody created by the previous Replace. You seem to think that if strDelete04 is not found, NewBody will be empty. No, if strDelete04 is not found, NewBody will be a copy of obj.HTMLBody.

你需要类似的东西:

NewBody = Replace(obj.HTMLBody, strDelete01, "")
NewBody = Replace(NewBody, strDelete02, "")
NewBody = Replace(NewBody, strDelete03, "")
NewBody = Replace(NewBody, strDelete04, "")
If NewBody <> obj.HTMLBody Then
  ' One or more delete strings found and removed

您说 CRLF 不在固定位置.如果是这样,对您的代码进行简单的修改就不会产生您所寻求的效果.我将向您展示如何实现您想要的效果,但首先我必须创建一些包含您的文本的电子邮件,以便我可以测试我的代码.

You say that the CRLFs are not in fixed positions. If so, no simple modification of your code will have the effect you seek. I will show you how to achieve the effect you seek but first I will have to create some emails containing your text so I can test my code.

第 2 部分

更仔细地查看了您的 Html 图像后,我相信有一个简单的解决方案.文本中的两个 CRLF 替换空格.如果总是发生这种情况,您可以使用:

Having looked at your image of the Html more closely, I believe there is a simple solution. The two CRLFs in the text replace spaces. Providing this is always what happens, you can use:

NewBody = Replace(obj.HTMLBody, vbCr & vbLf, " ")

这将删除任何出现在 Html 中的 CRLF.是否有额外的 CRLF 并不重要,因为在显示文档时,Html 文档中的任何空白字符(包括 CR 和 LF)字符串都会被单个空格替换.

This would remove any CRLF present wherever it appeared within the Html. It would not matter if there were extra CRLFs because any string of whitespace characters (which includes CR and LF) in an Html document is replaced by a single space when the document is displayed.

您完成删除不需要的文本:

You finish the removal of the unwanted text with:

Dim strDelete = "Diese E-Mail kommt von Personen außerhalb " & _
                "der Stadtverwaltung. Klicken Sie nur auf " & _
                "Links oder Dateianhänge, wenn Sie die Personen " & _
                "für vertrauenswürdig halten."

NewBody = Replace(NewBody, strDelete, "")

如果上述方法不起作用,则需要更方便的诊断技术.将整个电子邮件保存为 Html 可能很容易,但您无法确定结果与 VBA 宏看到的结果有何不同.您想知道 Outlook 是否以 Html 以外的格式存储电子邮件.我无法想象为什么 Outlook 会将传入的 SMTP 消息转换为某种秘密格式,然后在用户希望查看它时将其转换回来.如果 Outlook 确实有一个秘密格式,它对 VBA 程序员来说是完全隐藏的.

If the above does not work, you need a more convenient diagnostic technique. Saving the entire email as Html may be easy but you cannot be quite sure how the result differs from what a VBA macro would see. You wonder if Outlook stores emails in a format other than Html. I cannot imagine why Outlook would convert the incoming SMTP message to some secret format and then convert it back when the user wishes to view it. If Outlook does have a secret format, it is totally hidden from the VBA programmer.

以下是我使用的诊断工具的简单版本.如果您需要更高级的东西,我可以提供,但让我们先尝试一下.

The following is a simple version of the diagnostic tool I use. If you need something more advanced, I can provide it but let us try this first.

将以下代码复制到 Outlook 模块.选择其中一封电子邮件,然后运行宏 DsplHtmlBodyFromSelectedEmails.电子邮件的整个 Html 正文将以可读格式输出到立即窗口.我相信我已经包含了宏调用的所有子程序.如果我没有,我提前道歉.如果您收到有关未定义例程的消息,请告诉我,我会将其添加到答案中.

Copy the code below to an Outlook module. Select one of these emails and then run macro DsplHtmlBodyFromSelectedEmails. The entire Html body of the email will be output to the Immediate Window in a readable format. I believe I have included all the subroutines called by the macro. I apologise in advance if I have not. If you get a message about an undefined routine, let me know and I will add it to the answer.

Sub DsplHtmlBodyFromSelectedEmails()

  ' Select one or emails then run this macro.  For each selected email, the Received Time, the Subject and the Html body are output to the Immediate Window.  Note: the Immediate Window can only display about 200 lines before
The older lines are lost.

  Dim Exp As Explorer
  Dim Html As String
  Dim ItemCrnt As MailItem

  Set Exp = Outlook.Application.ActiveExplorer

  If Exp.Selection.Count = 0 Then
    Call MsgBox("Please select one or more emails then try again", vbOKOnly)
    Exit Sub
  Else
    For Each ItemCrnt In Exp.Selection
      With ItemCrnt
        If .Class = olMail Then
          Debug.Print .ReceivedTime & " " & .Subject
          Call OutLongTextRtn(Html, "Html", .HtmlBody)
          Debug.Print Html
        End If
      End With
    Next
  End If

End Sub
Sub OutLongTextRtn(ByRef TextOut As String, ByVal Head As String, _
                          ByVal TextIn As String)

  ' * Break TextIn into lines of not more than 100 characters
  '   and append to TextOut.
  ' * The output is arranged so:
  '     xxxx|sssssssssssssss|
  '         |sssssssssssssss|
  '         |ssssssssss|
  '   where "xxxx" is the value of Head and "ssss..." are characters from
  '         TextIn.  The third line in the example could be shorter because:
  '           * it contains the last few characters of TextIn
  '           * there a linefeed in TextIn
  '           * a <xxx> string recording whitespace would have been split
  '             across two lines.

  If TextIn = "" Then
    ' Nothing to do
    Exit Sub
  End If

  Const LenLineMax As Long = 100

  Dim PosBrktEnd As Long     ' Last > before PosEnd
  Dim PosBrktStart As Long   ' Last < before PosEnd
  Dim PosNext As Long        ' Start of block to be output after current block
  Dim PosStart As Long       ' First character of TextIn not yet output

  TextIn = TidyTextForDspl(TextIn)
  TextIn = Replace(TextIn, "lf›", "lf›" & vbLf)

  PosStart = 1
  Do While True
    PosNext = InStr(PosStart, TextIn, vbLf)
    If PosNext = 0 Then
      ' No LF in [Remaining] TextIn
      'Debug.Assert False
      PosNext = Len(TextIn) + 1
    End If
    If PosNext - PosStart > LenLineMax Then
      PosNext = PosStart + LenLineMax
    End If
    ' Check for <xxx> being split across lines
    PosBrktStart = InStrRev(TextIn, "‹", PosNext - 1)
    PosBrktEnd = InStrRev(TextIn, "›", PosNext - 1)
    If PosBrktStart < PosStart And PosBrktEnd < PosStart Then
      ' No <xxx> within text to be displayed
      ' No change to PosNext
      'Debug.Assert False
    ElseIf PosBrktStart > 0 And PosBrktEnd > 0 And PosBrktEnd > PosBrktStart Then
      ' Last or only <xxx> totally within text to be displayed
      ' No change to PosNext
      'Debug.Assert False
    ElseIf PosBrktStart > 0 And _
           (PosBrktEnd = 0 Or (PosBrktEnd > 0 And PosBrktEnd < PosBrktStart)) Then
      ' Last or only <xxx> will be split across rows
      'Debug.Assert False
      PosNext = PosBrktStart
    Else
      ' Are there other combinations?
      Debug.Assert False
    End If

    'Debug.Assert Right$(Mid$(TextIn, PosStart, PosNext - PosStart), 1) <> "‹"

    If TextOut <> "" Then
      TextOut = TextOut & vbLf
    End If
    If PosStart = 1 Then
      TextOut = TextOut & Head & "|"
    Else
      TextOut = TextOut & Space(Len(Head)) & "|"
    End If
    TextOut = TextOut & Mid$(TextIn, PosStart, PosNext - PosStart) & "|"
    PosStart = PosNext
    If Mid$(TextIn, PosStart, 1) = vbLf Then
      PosStart = PosStart + 1
    End If
    If PosStart > Len(TextIn) Then
      Exit Do
    End If
  Loop

End Sub
Function TidyTextForDspl(ByVal Text As String) As String

  ' Tidy Text for display by replacing white space with visible strings:
  '   Leave single space unchanged
  '   Replace single LF by                 ‹lf›
  '   Replace single CR by                 ‹cr›
  '   Replace single TB by                 ‹tb›
  '   Replace single non-break space by    ‹nbs›
  '   Replace single CRLF by               ‹crlf›
  '   Replace multiple spaces by           ‹n s›       where n is number of repeats
  '   Replace multiple LFs by              ‹n lf›      of white space character
  '   Replace multiple CRs by ‹cr› or      ‹n cr›
  '   Replace multiple TBs by              ‹n tb›
  '   Replace multiple non-break spaces by ‹n nbs›
  '   Replace multiple CRLFs by            ‹n crlf›

  Dim InsStr As String
  Dim InxWsChar As Long
  Dim NumWsChar As Long
  Dim PosWsChar As Long
  Dim RetnVal As String
  Dim WsCharCrnt As Variant
  Dim WsCharValue As Variant
  Dim WsCharDspl As Variant

  WsCharValue = VBA.Array(" ", vbCr & vbLf, vbLf, vbCr, vbTab, Chr(160))
  WsCharDspl = VBA.Array("s", "crlf", "lf", "cr", "tb", "nbs")

  RetnVal = Text

  ' Replace each whitespace individually
  For InxWsChar = 0 To UBound(WsCharValue)
    RetnVal = Replace(RetnVal, WsCharValue(InxWsChar), "‹" & WsCharDspl(InxWsChar) & "›")
  Next

  ' Look for repeats. If found replace <x> by <n x>
  For InxWsChar = 0 To UBound(WsCharValue)
    'Debug.Assert InxWsChar <> 1
    PosWsChar = 1
    Do While True
      InsStr = "‹" & WsCharDspl(InxWsChar) & "›"
      PosWsChar = InStr(PosWsChar, RetnVal, InsStr & InsStr)
      If PosWsChar = 0 Then
        ' No [more] repeats of this <x>
        Exit Do
      End If
      ' Have <x><x>.  Count number of extra <x>s
      NumWsChar = 2
      Do While Mid(RetnVal, PosWsChar + NumWsChar * Len(InsStr), Len(InsStr)) = InsStr
        NumWsChar = NumWsChar + 1
      Loop
      RetnVal = Mid(RetnVal, 1, PosWsChar - 1) & _
                "‹" & NumWsChar & " " & WsCharDspl(InxWsChar) & "›" & _
                Mid(RetnVal, PosWsChar + NumWsChar * Len(InsStr))
      PosWsChar = PosWsChar + Len(InsStr) + Len(NumWsChar)

    Loop
  Next

  ' Restore any single spaces
  RetnVal = Replace(RetnVal, "‹" & WsCharDspl(0) & "›", " ")

  TidyTextForDspl = RetnVal

End Function

这篇关于从电子邮件正文中删除包含 CRLF 字符的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆