从电子邮件正文中删除包含CRLF字符的字符串 [英] Removing a string that includes CRLF characters from body of e-mail

查看:135
本文介绍了从电子邮件正文中删除包含CRLF字符的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从选定的传入MS Outlook(2016)电子邮件中删除字符串.

I am trying to remove a string from selected incoming MS Outlook (2016) e-mail.

该字符串是德语中的两个句子.我使用 Replace()函数.这主要是有效的. (请参阅下面的完整过程.)

The string is two sentences in German language. I use the Replace() function. This principally works. (See my full procedure below.)

两个句子有时用CRLF(回车,换行)字符分隔,并且它们并不总是在同一位置.这似乎是这些电子邮件在进入我的Outlook收件箱之前通过各种设备传递的结果.

The two sentences are sometimes separated by CRLF (Carriage Return, Line Feed) characters, and these are not always at the same place. This seems to be the result of these e-mails passing through various devices before they land in my Outlook inbox.

在解决CRLF在不同位置出现的问题之前,我想创建一个处理CRLF在固定位置的字符串的过程.

Before addressing the issue of the CRLF appearing on varying places, I want to create a procedure that deals with strings with CRLF at fixed positions.

这样的字符串的源代码看起来如何:

How the source code of such a string would look:

(屏幕截图历史记录:我将电子邮件以.html格式保存在硬盘上,然后在Notepad ++中打开.html文件,以查看CRLF字符.)

(Screen shot history: I saved the e-mail as .html on my harddisk, then opened the .html file in Notepad++, to see the CRLF characters.)

html标记与我无关.它们可以保留在电子邮件中. (实际上,格式标记也有所不同,因此最好不要开始处理它们.)我唯一关心的是删除可见部分,即文本"Diese E-Mail kommt ...vertrauenswürdighalten".

The html tags are not that relevant for me. They can remain in the e-mail. (In fact, the formatting tags vary, too, so it is better to not start tackling them at all.) My only concern is to remove the visible part, i.e. the text "Diese E-Mail kommt... vertrauenswürdig halten".

我试图通过将CR LF部分包含为Chr()来捕获带有换行符的文本:

I tried to catch text with line breaks by including the CR LF part as Chr():

strDelete01 = "Diese E-Mail kommt von Personen" & Chr(13) & Chr(10) & "außerhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhnge," & Chr(13) & Chr(10) & "wenn Sie die Personenn für vertrauenswürdig halten." 

我的过程无法识别字符串,因此什么也不做.

My procedure does not recognize the string, and accordingly does nothing.

Public Sub EditBodyCgReplace()

'Declarations
   Dim obj As Object
   Dim Sel As Outlook.Selection
   Dim DoSave As Boolean
   Dim NewBody As String
   Dim strDelete01 As String
   Dim strDelete02 As String
   Dim strDelete03 As String
   Dim strDelete04 As String

'Fill the variables 
   strDelete01 = "Diese E-Mail kommt von Personen außerhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhänge, wenn Sie die Personen für vertrauenswürdig halten."
   strDelete02 = "################################################################################"
   strDelete03 = <hr>
   strDelete04 = "Diese E-Mail kommt von Personen" & Chr(13) & Chr(10) & "außerhalb der Stadtverwaltung. Klicken Sie nur auf Links oder Dateianhnge," & Chr(13) & Chr(10) & "wenn Sie die Personenn für vertrauenswürdig halten."

'Note: I am playing here with various types of strings at once. For example, 
'the procedure will also remove <hr> lines and "#####" strings  

'Work with it 
    If TypeOf Application.ActiveWindow Is Outlook.Inspector Then
        Set obj = Application.ActiveInspector.CurrentItem
    Else
        Set Sel = Application.ActiveExplorer.Selection
        If Sel.Count Then
            Set obj = Sel(1)
            DoSave = True
        End If
    End If

    If Not obj Is Nothing Then
        NewBody = Replace(obj.HTMLBody, strDelete01, "")
        NewBody = Replace(obj.HTMLBody, strDelete02, "")
        NewBody = Replace(obj.HTMLBody, strDelete03, "")
        NewBody = Replace(obj.HTMLBody, strDelete04, "")

        If NewBody <> "" Then
            obj.HTMLBody = NewBody
            If DoSave Then
                obj.Save
            End If
        End If
    End If
    End Sub

问题:如何在搜索字符串中包含CRLF?

Question: What can I do to include the CRLF in the search string?

后续问题:我该怎么做才能删除各个地方中包含的带有CRLF的字符串?有没有办法使用正则表达式? Outlook中的VBA可以处理吗? -想法:如果使用正则表达式,则整个CRLF问题都不再是问题,因为表达式看起来像

Follow-up question: What can I do to remove such strings with CRLF included in varying places? Is there a way to use regular expressions? Can VBA in Outlook deal with it? - Idea: if regular expressions work, perhaps the entire CRLF issue is not an issue anymore, as the expression would look something like

"Diese E-Mail kommt von * vertrauenswürdig halten."

并在其中包含任何内容-包括CRLF?

and thus include anything - including CRLF - in the middle?

经过各种实验,我开始感到MS Outlook在其电子邮件中根本没有使用HTML?

After doing various experiments I am starting to feel that MS Outlook does not use HTML at all in its e-mails?

我观察到我几乎无法解决obj.HTMLBody中的任何html代码.我可以处理纯文本.我无法处理html的某些部分,例如< hr",或者至少这是我认为正在观察的内容. (有一段时间我可以解决< hr>"并删除它,但是我无法重新创建昨天起作用的条件.)

I observe I can practically not address any html code in the obj.HTMLBody. I can address plain text. I cannot address parts of html such as "<hr ", or at least that is what I believe to be observing. (There was a moment when I could address "<hr>" and thus delete it, but I cannot recreate the conditions where this worked yesterday.)

我可以将电子邮件 另存为 html文件(Outlook外部,在我硬盘上的单独文件夹中的某个位置),在这些文件中,我确实看到了CRLF和其他东西.但是,也许只要保留在Outlook本身中的电子邮件是使用其他代码存储的?

I can save the e-mails as html files (outside Outlook, somewhere on my harddisk in a separate folder), and in these files I do see the CRLF and other stuff. But perhaps the e-mails, as long as kept in Outlook itself, are stored using some other code?

那么这段代码是什么,我该如何解决要删除的部分代码?

So what is this code, and how can I address parts of it for deleting?

推荐答案

由于我有空余时间,所以我将分部分回答您的问题.其他人可能会先于我而变得重要.

I will answer your question in parts as I have the spare time. Someone else may get to the important bit before I do.

我已编辑您的问题.我不明白几个句子,因此我查看了资料来源,发现我的怀疑是正确的,您所输入的字数少于字符.堆栈溢出允许有限数量的Html标签.看起来像Html标签的其他所有内容都将被忽略.我替换了每个<"与& lt;"因此读者可以看到您的HTML.如果您不明白为什么会这样,我可以添加一个解释.

I have edited your question. I did not understand a couple of sentences so I looked at the source and found my suspicion was correct, you had included less than characters. Stack Overflow permits a limited number of Html tags. Anything else that looks like an Html tag is ignored. I replaced each "<" with "&lt;" so readers could see your Html. I can add an explanation if you do not understand why this works.

您有:

NewBody = Replace(obj.HTMLBody, strDelete01, "")
NewBody = Replace(obj.HTMLBody, strDelete02, "")
NewBody = Replace(obj.HTMLBody, strDelete03, "")
NewBody = Replace(obj.HTMLBody, strDelete04, "")
If NewBody <> "" Then

每个Replace(第一个除外)将覆盖由前一个Replace创建的NewBody的值.您似乎认为如果未找到strDelete04,NewBody将为空.否,如果未找到strDelete04,则NewBody将是obj.HTMLBody的副本.

Each Replace (except the first) overwrites the value of NewBody created by the previous Replace. You seem to think that if strDelete04 is not found, NewBody will be empty. No, if strDelete04 is not found, NewBody will be a copy of obj.HTMLBody.

您需要类似的东西

NewBody = Replace(obj.HTMLBody, strDelete01, "")
NewBody = Replace(NewBody, strDelete02, "")
NewBody = Replace(NewBody, strDelete03, "")
NewBody = Replace(NewBody, strDelete04, "")
If NewBody <> obj.HTMLBody Then
  ' One or more delete strings found and removed

您说CRLF不在固定位置.如果是这样,您对代码的任何简单修改都不会达到您想要的效果.我将向您展示如何实现您想要的效果,但是首先我必须创建一些包含您的文本的电子邮件,以便我可以测试我的代码.

You say that the CRLFs are not in fixed positions. If so, no simple modification of your code will have the effect you seek. I will show you how to achieve the effect you seek but first I will have to create some emails containing your text so I can test my code.

第2部分

更仔细地查看了您的HTML图像,我相信有一个简单的解决方案.文本中的两个CRLF替换空格.只要总是发生这种情况,您可以使用:

Having looked at your image of the Html more closely, I believe there is a simple solution. The two CRLFs in the text replace spaces. Providing this is always what happens, you can use:

NewBody = Replace(obj.HTMLBody, vbCr & vbLf, " ")

这将删除任何出现在HTML中的CRLF.显示文档时,HTML文档中的任何空格字符字符串(包括CR和LF)都将被单个空格替换,因此是否存在多余的CRLF也没关系.

This would remove any CRLF present wherever it appeared within the Html. It would not matter if there were extra CRLFs because any string of whitespace characters (which includes CR and LF) in an Html document is replaced by a single space when the document is displayed.

您可以使用以下方法完成对不需要的文本的删除:

You finish the removal of the unwanted text with:

Dim strDelete = "Diese E-Mail kommt von Personen außerhalb " & _
                "der Stadtverwaltung. Klicken Sie nur auf " & _
                "Links oder Dateianhänge, wenn Sie die Personen " & _
                "für vertrauenswürdig halten."

NewBody = Replace(NewBody, strDelete, "")

如果上述方法不起作用,则需要更方便的诊断技术.将整个电子邮件另存为Html可能很容易,但是您不确定如何将结果与VBA宏看到的有所不同.您想知道Outlook是否以Html以外的格式存储电子邮件.我无法想象为什么Outlook会将传入的SMTP消息转换为某种秘密格式,然后在用户希望查看时将其转换回原来的格式.如果Outlook确实具有机密格式,则VBA程序员完全将其隐藏.

If the above does not work, you need a more convenient diagnostic technique. Saving the entire email as Html may be easy but you cannot be quite sure how the result differs from what a VBA macro would see. You wonder if Outlook stores emails in a format other than Html. I cannot imagine why Outlook would convert the incoming SMTP message to some secret format and then convert it back when the user wishes to view it. If Outlook does have a secret format, it is totally hidden from the VBA programmer.

以下是我使用的诊断工具的简单版本.如果您需要更高级的功能,我可以提供,但让我们先尝试一下.

The following is a simple version of the diagnostic tool I use. If you need something more advanced, I can provide it but let us try this first.

将以下代码复制到Outlook模块.选择这些电子邮件之一,然后运行宏DsplHtmlBodyFromSelectedEmails.电子邮件的整个HTML正文将以可读格式输出到立即窗口.我相信我已经包含了宏调用的所有子例程.如果没有的话,我事先表示歉意.如果您收到有关未定义例程的消息,请告诉我,我将其添加到答案中.

Copy the code below to an Outlook module. Select one of these emails and then run macro DsplHtmlBodyFromSelectedEmails. The entire Html body of the email will be output to the Immediate Window in a readable format. I believe I have included all the subroutines called by the macro. I apologise in advance if I have not. If you get a message about an undefined routine, let me know and I will add it to the answer.

Sub DsplHtmlBodyFromSelectedEmails()

  ' Select one or emails then run this macro.  For each selected email, the Received Time, the Subject and the Html body are output to the Immediate Window.  Note: the Immediate Window can only display about 200 lines before
The older lines are lost.

  Dim Exp As Explorer
  Dim Html As String
  Dim ItemCrnt As MailItem

  Set Exp = Outlook.Application.ActiveExplorer

  If Exp.Selection.Count = 0 Then
    Call MsgBox("Please select one or more emails then try again", vbOKOnly)
    Exit Sub
  Else
    For Each ItemCrnt In Exp.Selection
      With ItemCrnt
        If .Class = olMail Then
          Debug.Print .ReceivedTime & " " & .Subject
          Call OutLongTextRtn(Html, "Html", .HtmlBody)
          Debug.Print Html
        End If
      End With
    Next
  End If

End Sub
Sub OutLongTextRtn(ByRef TextOut As String, ByVal Head As String, _
                          ByVal TextIn As String)

  ' * Break TextIn into lines of not more than 100 characters
  '   and append to TextOut.
  ' * The output is arranged so:
  '     xxxx|sssssssssssssss|
  '         |sssssssssssssss|
  '         |ssssssssss|
  '   where "xxxx" is the value of Head and "ssss..." are characters from
  '         TextIn.  The third line in the example could be shorter because:
  '           * it contains the last few characters of TextIn
  '           * there a linefeed in TextIn
  '           * a <xxx> string recording whitespace would have been split
  '             across two lines.

  If TextIn = "" Then
    ' Nothing to do
    Exit Sub
  End If

  Const LenLineMax As Long = 100

  Dim PosBrktEnd As Long     ' Last > before PosEnd
  Dim PosBrktStart As Long   ' Last < before PosEnd
  Dim PosNext As Long        ' Start of block to be output after current block
  Dim PosStart As Long       ' First character of TextIn not yet output

  TextIn = TidyTextForDspl(TextIn)
  TextIn = Replace(TextIn, "lf›", "lf›" & vbLf)

  PosStart = 1
  Do While True
    PosNext = InStr(PosStart, TextIn, vbLf)
    If PosNext = 0 Then
      ' No LF in [Remaining] TextIn
      'Debug.Assert False
      PosNext = Len(TextIn) + 1
    End If
    If PosNext - PosStart > LenLineMax Then
      PosNext = PosStart + LenLineMax
    End If
    ' Check for <xxx> being split across lines
    PosBrktStart = InStrRev(TextIn, "‹", PosNext - 1)
    PosBrktEnd = InStrRev(TextIn, "›", PosNext - 1)
    If PosBrktStart < PosStart And PosBrktEnd < PosStart Then
      ' No <xxx> within text to be displayed
      ' No change to PosNext
      'Debug.Assert False
    ElseIf PosBrktStart > 0 And PosBrktEnd > 0 And PosBrktEnd > PosBrktStart Then
      ' Last or only <xxx> totally within text to be displayed
      ' No change to PosNext
      'Debug.Assert False
    ElseIf PosBrktStart > 0 And _
           (PosBrktEnd = 0 Or (PosBrktEnd > 0 And PosBrktEnd < PosBrktStart)) Then
      ' Last or only <xxx> will be split across rows
      'Debug.Assert False
      PosNext = PosBrktStart
    Else
      ' Are there other combinations?
      Debug.Assert False
    End If

    'Debug.Assert Right$(Mid$(TextIn, PosStart, PosNext - PosStart), 1) <> "‹"

    If TextOut <> "" Then
      TextOut = TextOut & vbLf
    End If
    If PosStart = 1 Then
      TextOut = TextOut & Head & "|"
    Else
      TextOut = TextOut & Space(Len(Head)) & "|"
    End If
    TextOut = TextOut & Mid$(TextIn, PosStart, PosNext - PosStart) & "|"
    PosStart = PosNext
    If Mid$(TextIn, PosStart, 1) = vbLf Then
      PosStart = PosStart + 1
    End If
    If PosStart > Len(TextIn) Then
      Exit Do
    End If
  Loop

End Sub
Function TidyTextForDspl(ByVal Text As String) As String

  ' Tidy Text for display by replacing white space with visible strings:
  '   Leave single space unchanged
  '   Replace single LF by                 ‹lf›
  '   Replace single CR by                 ‹cr›
  '   Replace single TB by                 ‹tb›
  '   Replace single non-break space by    ‹nbs›
  '   Replace single CRLF by               ‹crlf›
  '   Replace multiple spaces by           ‹n s›       where n is number of repeats
  '   Replace multiple LFs by              ‹n lf›      of white space character
  '   Replace multiple CRs by ‹cr› or      ‹n cr›
  '   Replace multiple TBs by              ‹n tb›
  '   Replace multiple non-break spaces by ‹n nbs›
  '   Replace multiple CRLFs by            ‹n crlf›

  Dim InsStr As String
  Dim InxWsChar As Long
  Dim NumWsChar As Long
  Dim PosWsChar As Long
  Dim RetnVal As String
  Dim WsCharCrnt As Variant
  Dim WsCharValue As Variant
  Dim WsCharDspl As Variant

  WsCharValue = VBA.Array(" ", vbCr & vbLf, vbLf, vbCr, vbTab, Chr(160))
  WsCharDspl = VBA.Array("s", "crlf", "lf", "cr", "tb", "nbs")

  RetnVal = Text

  ' Replace each whitespace individually
  For InxWsChar = 0 To UBound(WsCharValue)
    RetnVal = Replace(RetnVal, WsCharValue(InxWsChar), "‹" & WsCharDspl(InxWsChar) & "›")
  Next

  ' Look for repeats. If found replace <x> by <n x>
  For InxWsChar = 0 To UBound(WsCharValue)
    'Debug.Assert InxWsChar <> 1
    PosWsChar = 1
    Do While True
      InsStr = "‹" & WsCharDspl(InxWsChar) & "›"
      PosWsChar = InStr(PosWsChar, RetnVal, InsStr & InsStr)
      If PosWsChar = 0 Then
        ' No [more] repeats of this <x>
        Exit Do
      End If
      ' Have <x><x>.  Count number of extra <x>s
      NumWsChar = 2
      Do While Mid(RetnVal, PosWsChar + NumWsChar * Len(InsStr), Len(InsStr)) = InsStr
        NumWsChar = NumWsChar + 1
      Loop
      RetnVal = Mid(RetnVal, 1, PosWsChar - 1) & _
                "‹" & NumWsChar & " " & WsCharDspl(InxWsChar) & "›" & _
                Mid(RetnVal, PosWsChar + NumWsChar * Len(InsStr))
      PosWsChar = PosWsChar + Len(InsStr) + Len(NumWsChar)

    Loop
  Next

  ' Restore any single spaces
  RetnVal = Replace(RetnVal, "‹" & WsCharDspl(0) & "›", " ")

  TidyTextForDspl = RetnVal

End Function

这篇关于从电子邮件正文中删除包含CRLF字符的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆