使用正则表达式python查找电子邮件 [英] find email using regular expression python
问题描述
我想在文本文件中找到有效的电子邮件地址,这是我的代码:
I want to find valid email addresses in a text file, and this is my code:
email = re.findall(r'[a-zA-Z\.-]+@[\w\.-]+',line)
但是我的代码显然不包含@符号前有数字的电子邮件地址。而且我的代码无法处理没有有效结尾的电子邮件地址。那么有人可以帮我解决这两个问题吗?谢谢!
But my code obviously does not contain email addresses where there are numbers before @ sign. And my code could not handle email addresses that do not have valid ending. So could anyone help me with these two problems? Thank you!
我的问题的一个例子是:
An example of my problem would be:
我的代码可以找到此电子邮件:xyz @ gmail.com
my code can find this email: xyz@gmail.com
,但找不到此:xyz123@gmail.com
but it cannot find this one: xyz123@gmail.com
并且它无法过滤通过以下方式发送电子邮件:xyz @ gmail
And it cannot filter this email out either: xyz@gmail
推荐答案
从 python re docs , \w
匹配任何字母数字字符和下划线,相当于集合 [a-zA-Z0-9 _]
。因此 [\w\ .-]
将适当地匹配数字和字符。
From the python re docs, \w
matches any alphanumeric character and underscores, equivalent to the set [a-zA-Z0-9_]
. So [\w\.-]
will appropriately match numbers as well as characters.
email = re.findall(r'[\w\.-]+@[\w\.-]+(\.[\w]+)+',line)
这篇文章更加广泛地讨论了匹配的电子邮件地址,您遇到了很多陷阱,导致您的代码无法捕获匹配的电子邮件地址。例如,电子邮件地址不能完全由标点符号组成( ... @ ....
)。此外,地址通常有最大长度,具体取决于电子邮件服务器。此外,许多电子邮件服务器都匹配非英语字符。因此,根据您的需求,您可能需要一个更全面的模式。
This post discusses matching email addresses much more extensively, and there are a couple more pitfalls you run into matching email addresses that your code fails to catch. For example, email addresses cannot be made up entirely of punctuation (...@....
). Additionally, there is often a maximum length on addresses, depending on the email server. Also, many email servers match non-english characters. So depending on your needs you may need a more comprehensive pattern.
这篇关于使用正则表达式python查找电子邮件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!