在大文本中找到某种模式的有效方法是什么? [英] What is the efficient way to find some pattern in a big text?

查看：36 发布时间：2021/9/6 19:10:27 regex text

本文介绍了在大文本中找到某种模式的有效方法是什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从大型文本文件中提取电子邮件地址.最好的方法是什么?

I want to extract email addresses from a large text file. what is the best way to do it?

我的想法是在文本中找到 '@' 并使用Regex"将电子邮件地址查找到子字符串中(例如)此位置之前的 256 个字符和 512 的长度.

My idea is to find '@' in the text and use "Regex" to find email address into substring at (for example) 256 chars before this position and length of 512.

P.S.:坦率地说，我想知道在巨大文本中找到某种模式(如电子邮件地址)的最佳和最有效的方法.

P.S.: Straightforwardly I want to know the best and most efficient way to find some pattern (like email addresses) in a huge text.

推荐答案

如果你绝对需要最有效的方式，我认为不应该使用正则表达式.

If you absolutely need the most efficient way, I don't think regular expressions should be used.

假设您的文本中几乎所有 @ 实例都是电子邮件地址，并且您使用的是一种快速向前和向后字符串遍历的语言，这种方法可能会接近最快:

Assuming almost all instances of @ in your text are email addresses and you are working in a language with fast forward and backward string traversal, this method will probably be close to the fastest:

搜索@
手动比较 @ 后面的每个字符以确保它们在允许的 ASCII 范围内
跟踪是否在第一个空格或其他有效终止字符之前找到了有效域
再次从 @ 符号向后搜索，比较每个字符以确保它们在本地组件的有效字符范围内

Search for @
Manually compare each character after the @ to make sure they are within the allowed ASCII ranges
Keep track of whether a valid domain was found before the first space or other valid terminating character
Search again from the @ symbol backwards, comparing each character to make sure they fall within the valid character ranges for the local component

这篇关于在大文本中找到某种模式的有效方法是什么?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在大文本中找到某种模式的有效方法是什么? [英] What is the efficient way to find some pattern in a big text?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在大文本中找到某种模式的有效方法是什么? [英] What is the efficient way to find some pattern in a big text?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭