Base R Regex 中的正则表达式来识别电子邮件地址 [英] Regular Expression in Base R Regex to identify email address
问题描述
我正在尝试使用 stringr 库从一个大而杂乱的文件中提取电子邮件.
I am trying to use the stringr library to extract emails from a big, messy file.
str_match 不允许 perl=TRUE,而且我无法找出转义字符以使其正常工作.
str_match doesn't allow perl=TRUE, and I can't figure out the escape characters to get it to work.
有人可以推荐一个可以在下面的上下文中使用的相对强大的正则表达式吗?
Can someone recommend a relatively robust regex that would work in the context below?
c("larry@gmail.com", "larry-sally@sally.com", "larry@sally.larry.com")->emails
"SomeRegex"->regex
str_match(emails, regex)
推荐答案
> "^[[:alnum:].-_]+@[[:alnum:].-]+$"->regex
> str_match(emails, regex)
[,1]
[1,] "larry@gmail.com"
[2,] "larry-sally@sally.com"
[3,] "larry@sally.larry.com"
@ 符号不需要在正则表达式中转义.和 "."和-"在字符类中并不特殊.如果您想添加对.com"、.co"、.edu"、.org"的要求,那么您应该指定该列表的完整程度.
The @-sign is not in need of escaping in regex. And "." and "-" are not special in character classes. If you want to add a requirement for ".com",".co", ".edu", ".org" then you should specify how complete that list needs to be.
正如 M42 所指出的,这不是一个万无一失的方法.事实上,它声称没有万无一失的方法:使用正则表达式验证电子邮件地址
As pointed out by M42, this is not a surefire method. In fact it is claimed that there is no sure-fire method: Using a regular expression to validate an email address
这篇关于Base R Regex 中的正则表达式来识别电子邮件地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!