正则表达式来查找字符串的电子邮件地址 [英] regex to find email address from a String

查看:134
本文介绍了正则表达式来查找字符串的电子邮件地址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的意图是从网页上获取电子邮件地址。我有页面源。我正在逐行阅读页面源代码。现在我想从我正在阅读的当前行获取电子邮件地址。该当前行可能有或没有电子邮件。我看到了很多正则表达式的例子。但其中大多数是用于验证电子邮件地址。我想从页面源中获取未验证的电子邮件地址。它应该像 http://emailx.discoveryvip.com/ 正在工作一样工作



<一些示例输入行为:

  1)< p>将详细信息发送到< a href =&# 109;&安培;#97;&安培;#105;&安培;#108;&安培;#116;&安培;#111;&安培;#58;%72%65%62%65%图6b%61%68 @%68% 61%63%图6b%73%75%72%66%65%72%63%1207%6D> neeraj@yopmail.com< / A>< / p为H. 

2)< p>有兴趣的应该将详细信息直接发送到< a href =http://www.abcdef.com/abcdef/> www.abcdef.com/abcdef/< ; / a取代。如果您有任何问题,请发送电子邮件至< a href =&#97;&#105;&#108;&#116;&#111;&#58;% 6a%6f%62%73@%72%65%6c%61%79.%65%64%75\" > neeraj@yopmail.com< / a取代。

3)注意: - 请发送您的疑问在neeraj@yopmail.com获取更多详情致电先生neeraj 012345678901.

我想从示例1,2和3中获得neeraj@yopmail.com。
我使用java,而我在rexexp中不擅长。帮我。

解决方案

您可以根据RFC 2822 与此:

 (?:[ A-Z0-9#$%&安培;!?!* + / = ^ _`{|}〜 - ] +(?:\ [A-Z0-9#$%&放大器;'* + / = ?^ _`{|}〜 - ] +)* |(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\\ \\x5d-\x7f] | \\ [\x01-\x09\x0b\x0c\x0e-\x7f])*)@(?:?(:[A-z0- 9](在[a-Z0-9  - ] * [A-Z0-9]?)\)+ [A-Z0-9](?:?[A-Z0-9  - ] * [A- Z0-9])|?\ [(:( ?: 25 [0-5?] | 2 [0-4] [0-9] | [01] [0-9] [0-9]? )\){3}(?: 25 [0-5] | 2 [0-4] [0-9] | [01] [0-9] [0-9] |?[A-Z0 -9  - ] * [A-Z0-9]:(:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\ x7f] | \\ [\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

以下是常规-expressions.info


这个正则表达式有两部分:@之前的部分和@之后的部分。 @之前的部分有两种选择:它可以由一系列字母,数字和某些符号组成,包括一个或多个点。但是,点可能不会连续出现或者在电子邮件地址的开始或结尾处出现。另一种替代方法要求将@之前的部分用双引号括起来,以允许引号之间的任何ASCII字符串。空白字符,双引号和反斜杠必须用反斜杠转义。


你可以在这里查看: Rubular示例


My intention is to get email address from a web page. I have the page source. I am reading the page source line by line. Now I want to get email address from the current line I am reading. This current line may or may not have email. I saw a lot of regexp examples. But most of them are for validating email address. I want to get the email address from a page source not validate. It should work as http://emailx.discoveryvip.com/ is working

Some examples input lines are :

1)<p>Send details to <a href="&#109;&#97;&#105;&#108;&#116;&#111;&#58;%72%65%62%65%6b%61%68@%68%61%63%6b%73%75%72%66%65%72.%63%6f%6d">neeraj@yopmail.com</a></p>

2)<p>Interested should send details directly to <a href="http://www.abcdef.com/abcdef/">www.abcdef.com/abcdef/</a>. Should you have any questions, please email <a href="&#109;&#97;&#105;&#108;&#116;&#111;&#58;%6a%6f%62%73@%72%65%6c%61%79.%65%64%75">neeraj@yopmail.com</a>.

3)Note :- Send your queries at  neeraj@yopmail.com  for more details call Mr. neeraj 012345678901.

I want to get neeraj@yopmail.com from examples 1,2 and 3. I am using java and I am not good in rexexp. Help me.

解决方案

You can validate e-mail address formats as according to RFC 2822, with this:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

and here's an explanation from regular-expressions.info:

This regex has two parts: the part before the @, and the part after the @. There are two alternatives for the part before the @: it can either consist of a series of letters, digits and certain symbols, including one or more dots. However, dots may not appear consecutively or at the start or end of the email address. The other alternative requires the part before the @ to be enclosed in double quotes, allowing any string of ASCII characters between the quotes. Whitespace characters, double quotes and backslashes must be escaped with backslashes.

And you can check this out here: Rubular example.

这篇关于正则表达式来查找字符串的电子邮件地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆