正则表达式仅从字符串中获取单词 [英] Regex for getting only words from string

查看:741
本文介绍了正则表达式仅从字符串中获取单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好.

是否可以将输入文本拆分为单词列表.单词应仅包含符号.

对于前. 这是彼得的第五个节目."我想说几个字:"This","is","Peter","program".

我可以只使用正则表达式吗?还是最好使用myString.Split(''''),分析每个单词并删除符号等?

感谢您的帮助.

Hi all.

Is it possible to split input text into list of words. Words should contain only symbols.

for ex. "This is Peter''s 5th program." I want to get words: "This", "is", "Peter", "program".

Can I do it using only regular expressions? Or it is better to use myString.Split('' ''), analyse each word and remove signs etc?

Thanks for any help.

推荐答案

这实际上非常困难,可能根本不适合正则表达式.问题是您要接受彼得"中的彼得",而放弃第五".您真正想要做的可能是使用字典(适当的字典,而不是.NET字典类)并检查实际单词.否则,您将如何使用"Peters"或"it" s?
That actually quite difficult, and probably not suited for a regex at all. The problem is that you want to accept the "Peter" from "Peter''s" but discard "5th". What you really want to do is probably use a dictionary (a proper one, rather than an .NET Dictionary class) and check for actual words. Other wise, what are you going to do with "Peters''" or "it''s"?


我将从Split开始.问题实际上不是正则表达式所致,这也很难支持.

—SA
I would start with Split. The problem is not really up to Regex, which is also would be hard to support.

—SA


这将适用于您指定的输入
(^|\s)(?<word>[a-zA-Z][a-zA-Z'']*)</word>

我同意OriginalGriff的观点,即制作成能够100%工作的正则表达式,即使不是没有可能,也几乎没有.如果您不需要100%的精度,那么正则表达式应该为您锻炼.
This one will work for the input you specified
(^|\s)(?<word>[a-zA-Z][a-zA-Z'']*)</word>

I agree with OriginalGriff that making a regex that''ll work 100% is if not impossible then atleast almost. If you do not required 100% precision then the regex should workout for you.


这篇关于正则表达式仅从字符串中获取单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆