正则表达式,用于检测文本中的电子邮件 [英] Regex for detecting emails in text

查看:35
本文介绍了正则表达式,用于检测文本中的电子邮件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C#中有一个正则表达式来检测文本中的电子邮件,然后在其中放置带有mailto参数的锚标记以使其可单击.但是,如果电子邮件已经在锚标记中,则正则表达式会在锚标记中检测到电子邮件,然后下一个代码在其上放置另一个锚标记.正则表达式中有什么方法可以避免锚标记中已经存在的电子邮件?

I have a Regex in C# to detect emails in text and then I put an anchor tag with mailto parameter in it to make it clickable. But if the email is already in an anchor tag, the regex detects the email in the anchor tag and then then next code puts another anchor tag over it. Is there any way in Regex to avoid the emails which are already in the anchor tag?

C#中的正则表达式代码为:

The regex code in C# is:

string sRegex = @"([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)";

Regex Regx = new Regex(sRegex, RegexOptions.IgnoreCase);

,示例文本为:

string sContent = "ttt <a href='mailto:someone@example.com'>someemail@mail.com</a> abc email@email.com";

,所需的输出是:

"ttt <a href='mailto:someone@example.com'>someemail@mail.com</a> abc <a href='mailto:email@email.com'>email@email.com</a>";

因此,这里的全部要点是Regex应该只检测不在锚标签内或已可单击的有效电子邮件,也不应该是锚标签内的锚标签的href值.

So, the whole point here is that Regex should only detect those valid emails which are not inside an anchor tag or already clickable and neither should be the anchor tag's href value inside the anchor tag.

上面给出的正则表达式正在检测文本中所有不需要的电子邮件.

The above given Regex is detecting every possible email in the text which is not desired.

推荐答案

您能否使用否定的外观来测试mailto:

Could you use a negative look behind to test for mailto:

(?<!mailto \:)([\ w-] +(.[\ w-] +)@([a-z0-9-] +(.[a-z0-9-] +)?.[az] {2,6} |(\ d {1,3}.){3} \ d {1,3})(:\ d {4})?)

应该匹配 mailto:

我认为正在发生的是([\ w \-] +(.[\ w-])+)中的.匹配太多.您是要使用.而不是 \.吗?

I think what is happening is the . in ([\w\-]+(.[\w-])+) is matching too much. Did you mean to use . rather than \. ?

通过转义.以下代码产生

someemail@mail.com
email@email.com


public void Test()
{

    Regex pattern = new Regex(@"\b(?<!mailto:)([\w\-]+(\.[\w\-])*@([a-z0-9-]+(.[a-z0-9-]+)?.[a-z]{2,6}|(\d{1,3}.){3}\d{1,3})(:\d{4})?)");
    MatchCollection matchCollection = pattern.Matches("ttt <a href='mailto:someone@example.com'>someemail@mail.com</a> abc email@email.com");
    foreach (Match match in matchCollection)
    {
        Debug.WriteLine(match);
    }
}

您尝试执行的操作在现实世界中的实现可能看起来像这样

Regex pattern = new Regex(@"(?<!mailto\:)\b[\w\-]+@[a-z0-9-]+(\.[a-z0-9\-])*\.[a-z]{2,8}\b(?!\<\/a)");
MatchCollection matchCollection = pattern.Matches("ttt <a href='mailto:so1meone@example.com'>someemail@mail.com</a> abc email@email.com");
foreach (Match match in matchCollection)
{
    Debug.WriteLine(match);
}


对不起,您是对的,我还没有考虑过否定性声明不够贪婪.


Sorry, you are correct, I hadn't considered that the negative assertion wouldn't be greedy enough.

\ b(?!mailto \:)([\ w-] +(.[\ w-] +)@([a-z0-9-] +(.[a-z0-9-] +)?.[az] {2,6} |(\ d {1,3}.){3} \ d {1,3})(:\ d {4})?)

应该工作

这篇关于正则表达式,用于检测文本中的电子邮件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆