使用 ruby​​ 从一些 .txt 文档中提取所有电子邮件地址 [英] extract all email addresses from some .txt documents using ruby

查看:41
本文介绍了使用 ruby​​ 从一些 .txt 文档中提取所有电子邮件地址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须从一些 .txt 文档中提取所有电子邮件地址.这些电子邮件可能采用以下格式:

I have to extract all email addresses from some .txt documents. These emails may have these formats:

  1. a@abc.com
  2. {a, b, c}@abc.edu
  3. 其他一些格式,包括一些 @ 符号.

我选择 ruby​​ 作为我的第一语言来编写这个程序,但我不知道如何编写正则表达式.有人会帮助我吗?谢谢!

I choose ruby for my first language to write this program, but i don't know how to write the regex. Would someone help me? Thank you!

推荐答案

根据 .txt 文档的性质,您不必使用尝试验证电子邮件地址的复杂正则表达式之一.你不是要验证任何东西.你只是想抓住已经存在的东西.一般来说,用于获取已有内容的正则表达式比需要验证输入的正则表达式简单得多.

Depending on the nature of your .txt documents, you don't have to use one of the complicated regexes that attempt to validate email addresses. You're not trying to validate anything. You're just trying to grab what's already there. Generally speaking, a regex to grab what's already there can be much simpler than a regex that needs to validate input.

一个重要的问题是您的 .txt 文档是否包含不属于您要提取的电子邮件地址的 @ 符号.

An important question is whether your .txt documents contain @ signs that are not part of an email address you want to extract.

此正则表达式处理您的前两个要求:

This regex handles your first two requirements:

\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+

或者,如果您想允许包含@ 符号的任何非空格字符序列,以及您的第二个要求(有空格):

Or if you want to allow any sequence of non-space characters containing an @ sign, plus your second requirement (which has spaces):

\S+@\S+|\{(?:\w+, *)+\w+\}@[\w.-]+

这篇关于使用 ruby​​ 从一些 .txt 文档中提取所有电子邮件地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆