在ruby / rails的html块中提取电子邮件地址 [英] Extracting email addresses in an html block in ruby/rails

查看:120
本文介绍了在ruby / rails的html块中提取电子邮件地址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个解析器,以防止来自tinyMCE的文本块中的垃圾邮件和电子邮件收集(因此其中可能包含或不包含html标签)

I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it)

我已经尝试过正则表达式,到目前为止,它已经成功完成:

I've tried regexes and so far this has been successful:

/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i

问题是,我需要忽略所有带有mailto hrefs的电子邮件地址。例如:

problem is, i need to ignore all email addresses with mailto hrefs. for example:

<a href="mailto:test@mail.com">test@mail.com</a>

只应返回第二封电子邮件。

should only return the second email add.

要了解即时信息的背景,即时将电子邮件地址反向转换为一个块,以便上面的示例如下所示:

To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this:

<a href="mailto:test@mail.com">moc.liam@tset</a>

我当前的正则表达式的问题是它也替换了href中的那个。有没有办法让我使用单个正则表达式来做到这一点?还是我必须先检查一个然后再检查另一个?有没有办法让我仅通过使用gsub来执行此操作,还是必须使用一些nokogiri / hpricot magicks和诸如此类的东西来解析mailto?

problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance!

以下是我的参考资料:

so.com/questions/504860/extract-电子邮件地址来自文本块

so.com/questions/504860/extract-email-addresses-from-a-block-of-text

so.com/questions/1376149/regexp-for-extractioning-amailto-address

so.com/questions/1376149/regexp-for-extracting-a-mailto-address

我还使用以下代码进行测试:

im also testing using this:

http://rubular.com/

edit

这是我当前的帮助程序代码:

here's my current helper code:

def email_obfuscator(text)
  text.gsub(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m|
    m = "<span class='anti-spam'>#{m.reverse}</span>"
  }
end

这将导致:

<a target="_self" href="mailto:<span class='anti-spam'>moc.liamg@tset</span>"><span class="anti-spam">moc.liamg@tset</span></a>


推荐答案

如果lookbehind不起作用,则另一种选择:

Another option if lookbehind doesn't work:

/ \b(mailto:)?([A-Z0-9 ._%+-] + @ [A-Z0-9。 -] + \。[AZ] {2,4})\b / i

这将匹配所有电子邮件,然后您可以手动检查第一个捕获的组是否为 mailto:,然后跳过此匹配项。

This would match all emails, then you can manually check if first captured group is "mailto:" then skip this match.

这篇关于在ruby / rails的html块中提取电子邮件地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆