R gsub从文本中提取电子邮件 [英] R gsub to extract emails from text

查看:107
本文介绍了R gsub从文本中提取电子邮件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由包含某些电子邮件的文件的readLines创建的变量a.我已经只过滤了带有@符号的行,现在正努力抓取电子邮件.我的变量中的文本如下所示:

I have a variable a created by readLines of a file which contains some emails. I already filtered only those rows whith the @ symbol, and now am struggling to grab the emails. The text in my variable looks like this:

> dput(a[1:5])
c("buenas tardes. excelente. por favor a: Saolonm@hotmail.com", 
"26.leonard@gmail.com ", "Aprecio tu aporte , mi correo es jcdavola31@gmail.com , Muchas Gracias", 
"gracias andrescarnederes@headset.cl", "Me apunto, muchas gracias mi dirección luciana.chavela.ecuador@gmail.com me será de mucha utilidad. "
)

来自此问题我有一个提取电子邮件的起点(@Aaron Haurun的答案),该邮件稍作修改(我在@之前添加了[\w.]来解决名称之间带有.的电子邮件)在regex101.com上正常工作电子邮件.但是,当我将其移植到gsub时,它会失败:

From this question in SO I got a starting point to extract the emails (@Aaron Haurun's answer), which slightly modified (I added a [\w.] before the @ to address emails with . between names) worked well in regex101.com to extract the emails. However, it fails when I port it to gsub:

> gsub("()(\\w[\\w.]+@[\\w.-]+|\\{(?:\\w+, *)+\\w+\\}@[\\w.-]+)()", 
       "\\2", 
       a[1:5], 
       perl = FALSE) ## It doesn't matter if I use perl = TRUE

[1] "buenas tardes. excelente. por favor a: Saolonm@hotmail.com"           "26.leonard@gmail.com "                                                                          
[3] "Aprecio tu aporte , mi correo es jcdavola31@gmail.com , Muchas Gracias"                           "gracias andrescarnederes@headset.cl"                                                                       
[5] "Me apunto, muchas gracias mi dirección luciana.chavela.ecuador@gmail.com me será de mucha utilidad. "

我在做什么错了,我该如何获取这些电子邮件?谢谢!

What am I doing wrong and how can I grab those emails? Thanks!

推荐答案

我们可以尝试使用stringr包中的str_extract():

We can try the str_extract() from stringr package:

str_extract(text, "\\S*@\\S*")

[1] "Saolonm@hotmail.com"              
[2] "26.leonard@gmail.com"             
[3] "jcdavola31@gmail.com"             
[4] "andrescarnederes@headset.cl"      
[5] "luciana.chavela.ecuador@gmail.com"

其中\\S*匹配任意数量的非空格字符.

where \\S* match any number of non-space character.

这篇关于R gsub从文本中提取电子邮件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆