使用 r 从字符串中提取电子邮件地址 [英] Extract e-mail address from string using r

查看:72
本文介绍了使用 r 从字符串中提取电子邮件地址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是 5 个 Twitter 用户描述.这个想法是从每个字符串中提取电子邮件.

These are 5 twitter user descriptions. The idea is to extract the e-mail from each string.

这是我试过的代码,它可以工作,但可能有更好的东西.我宁愿避免使用 unlist() 并使用正则表达式一次性完成.我已经看到了针对 python/perl/php 但没有针对 R 的其他问题.我知道我可以使用 grep(..., perl = TRUE) 但这不应该是唯一的方法.如果有效,当然会有帮助.

This is the code i've tried, it works but there is probably something better. I'd rather avoid using unlist() and do it in one go using regex. I've seen other questions of the kind for python/perl/php but not for R. I know i could use grep(..., perl = TRUE) but that should't be the only way to do it. If it works, of course it helps.

ds <- c("#MillonMusical | #PromotorMusical | #Diseñador | Contacto :        ezequielife@gmail.com | #Instagram : Ezeqielgram | 01-11-11 |           @_MillonMusical @flowfestar", "LipGLosSTudio by: SAndry RUbio           Maquilladora PRofesional estudiande de diseño profesional de maquillaje     artistico lipglosstudio@hotmail.com/", "Medico General Barranquillero   radicado con su familia en Buenos Aires para iniciar Especialidad       Medico Quirurgica. email jaenpavi@hotmail.com", "msn =
    rdt031169@hotmail.comskype = ronaldotorres-br", "Aguante piscis /       manuarias17@gmail.com  buenos aires"
    )

ds <- unlist(strsplit(ds, ' '))
ds <- ds[grep("mail.", ds)]

> print(ds)
[1] "\t\tezequielife@gmail.com"  "lipglosstudio@hotmail.com/"
[3] "jaenpavi@hotmail.com"       "rdt031169@hotmail.comskype"
[5] "/\t\tmanuarias17@gmail.com"

最好将这个rdt031169@hotmail.comskype"分开也许要求它以 .com 或 .com.ar 结尾,这对我的工作有意义

It would be nice to separate this one "rdt031169@hotmail.comskype" perhaps asking it to end in .com or .com.ar that would make sense for what i'm working on

推荐答案

这是一个替代方案:

> regmatches(ds, regexpr("[[:alnum:]]+\\@[[:alpha:]]+\\.com", ds))
[1] "ezequielife@gmail.com"     "lipglosstudio@hotmail.com" "jaenpavi@hotmail.com"      "rdt031169@hotmail.com"    
[5] "manuarias17@gmail.com" 

根据@Frank 的评论,如果你想在 .com 之后保留国家标识符,就像你的例子 .com.ar 那么,看看这个:

Based on @Frank's comment, if you want to keep country identifier after .com as in your example .com.ar then, look at this:

> ds <- c(ds, "fulanito13@somemail.com.ar")  # a new e-mail address
> regmatches(ds, regexpr("[[:alnum:]]+\\@[[:alpha:]]+\\.com(\\.[a-z]{2})?", ds))
[1] "ezequielife@gmail.com"      "lipglosstudio@hotmail.com"  "jaenpavi@hotmail.com"       "rdt031169@hotmail.com"     
[5] "manuarias17@gmail.com"      "fulanito13@somemail.com.ar"

这篇关于使用 r 从字符串中提取电子邮件地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆