什么是分析字符串的最佳方式? [英] What's the best way of parsing strings?

查看:122
本文介绍了什么是分析字符串的最佳方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们已经得到了需要我们来分析大量的电子邮件(纯文本)的场景中,每个电子邮件'类型'是被反对各种平台上运行脚本的结果。有些是制表符分隔的,有些是空格分隔,有的我们根本不知道。

We've got a scenario that requires us to parse lots of e-mail (plain text), each e-mail 'type' is the result of a script being run against various platforms. Some are tab delimited, some are space delimited, some we simply don't know yet.

我们需要支持更多的未来格式了。

We'll need to support more 'formats' in the future too.

难道我们使用去一个解决方案:

Do we go for a solution using:


  • 正则表达式

  • 简单的字符串搜索(使用string.IndexOf等)

  • 的Lex / Yacc的

  • 其他

  • Regex
  • Simply string searching (using string.IndexOf etc)
  • Lex/ Yacc
  • Other

该整体解决方案将开发C#2.0(希望3.5)

The overall solution will be developed in C# 2.0 (hopefully 3.5)

推荐答案

您说的每个涵盖非常不同需求的三种解决方案。

The three solutions you stated each cover very different needs.

手动解析(简单的文本搜索)是最灵活和最适应性强,但是,它很快成为在屁股真正的痛苦所需的解析是比较复杂的。

Manual parsing (simple text search) is the most flexible and the most adaptable, however, it very quickly becomes a real pain in the ass as the parsing required is more complicated.

正则表达式是一个中间地带,而可能是你最好的赌注在这里。它们是强大的,灵活的,你可以自己从调用不同的正则表达式的代码添加更多的逻辑。其主要缺点是速度在这里。

Regex are a middle ground, and probably your best bet here. They are powerful, yet flexible as you can yourself add more logic from the code that call the different regex. The main drawback would be speed here.

的Lex / Yacc的真的只适合于非常复杂的,可预见的语法和欠缺很多职位的编译灵活性。你不能轻易改变解析器解析中旬,还有其实你可以,但它只是太沉重,你会使用正则表达式,而不是更好。

Lex/Yacc is really only adapted to very complicated, predictable syntaxes and lacks a lot of post compile flexibility. You can't easily change parser in mid parsing, well actually you can but it's just too heavy and you'd be better using regex instead.

我知道这是一个老生常谈的答案,这一切真的可以归结为您的实际需要是什么,但你所说的,我个人可能与正则表达式的包走了。

I know this is a cliché answer, it all really comes down to what your exact needs are, but from what you said, I would personally probably go with a bag of regex.

作为替代,因为Vaibhav的poionted出来,如果你有几种不同的情况可能出现,并且您中央社很容易地检测哪一个来了,你能做出选择正确的插件系统算法,这些算法都可能有很大的不同,使用一个莱克斯/ Yacc的在尖尖的情况下,另一个使用的IndexOf和正则表达式为简单的情况。

As an alternative, as Vaibhav poionted out, if you have several different situations that can arise and that you cna easily detect which one is coming, you could make a plugin system that chooses the right algorithm, and those algorithms could all be very different, one using Lex/Yacc in pointy cases and the other using IndexOf and regex for simpler cases.

这篇关于什么是分析字符串的最佳方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆