使用正则表达式python查找电子邮件 [英] find email using regular expression python

查看:419
本文介绍了使用正则表达式python查找电子邮件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在文本文件中找到有效的电子邮件地址,这是我的代码:

I want to find valid email addresses in a text file, and this is my code:

email = re.findall(r'[a-zA-Z\.-]+@[\w\.-]+',line)

但是我的代码显然不包含@符号前有数字的电子邮件地址。而且我的代码无法处理没有有效结尾的电子邮件地址。那么有人可以帮我解决这两个问题吗?谢谢!

But my code obviously does not contain email addresses where there are numbers before @ sign. And my code could not handle email addresses that do not have valid ending. So could anyone help me with these two problems? Thank you!

我的问题的一个例子是:

An example of my problem would be:

我的代码可以找到此电子邮件:xyz @ gmail.com

my code can find this email: xyz@gmail.com

,但找不到此:xyz123@gmail.com

but it cannot find this one: xyz123@gmail.com

并且它无法过滤通过以下方式发送电子邮件:xyz @ gmail

And it cannot filter this email out either: xyz@gmail

推荐答案

python re docs \w 匹配任何字母数字字符和下划线,相当于集合 [a-zA-Z0-9 _] 。因此 [\w\ .-] 将适当地匹配数字和字符。

From the python re docs, \w matches any alphanumeric character and underscores, equivalent to the set [a-zA-Z0-9_]. So [\w\.-] will appropriately match numbers as well as characters.

email = re.findall(r'[\w\.-]+@[\w\.-]+(\.[\w]+)+',line)

这篇文章更加广泛地讨论了匹配的电子邮件地址,您遇到了很多陷阱,导致您的代码无法捕获匹配的电子邮件地址。例如,电子邮件地址不能完全由标点符号组成( ... @ .... )。此外,地址通常有最大长度,具体取决于电子邮件服务器。此外,许多电子邮件服务器都匹配非英语字符。因此,根据您的需求,您可能需要一个更全面的模式。

This post discusses matching email addresses much more extensively, and there are a couple more pitfalls you run into matching email addresses that your code fails to catch. For example, email addresses cannot be made up entirely of punctuation (...@....). Additionally, there is often a maximum length on addresses, depending on the email server. Also, many email servers match non-english characters. So depending on your needs you may need a more comprehensive pattern.

这篇关于使用正则表达式python查找电子邮件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆