正则表达式匹配单词和带撇号的单词 [英] Regex to match words and those with an apostrophe

查看:83
本文介绍了正则表达式匹配单词和带撇号的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:根据关于我的问题含糊不清的评论,我增加了问题的细节.

Update: As per comments regarding the ambiguity of my question, I've increased the detail in the question.

(术语:我指的是任何连续的字母数字字符.)

(Terminology: by words I am refering to any succession of alphanumerical characters.)

我正在寻找一个正则表达式以逐字匹配以下内容:

I'm looking for a regex to match the following, verbatim:

  • 文字.
  • 开头带有一个撇号的词.
  • 在整个中间带有任意数量的不连续的撇号的单词.
  • 结尾带有一个撇号的单词.
  • Words.
  • Words with one apostrophe at the beginning.
  • Words with any number of non-contiguous apostrophe throughout the middle.
  • Words with one apostrophe at the end.

我想匹配以下内容,但不是逐字匹配,而是删除撇号:

I would like to match the following, however not verbatim, rather, removing the apostrophes:

  • 开头和结尾带有撇号的单词将与不带撇号的单词匹配.所以 'foo' 将与 foo 匹配.
  • 中间有多个连续撇号的单词将被解析为两个不同的单词:连续撇号之前的片段和连续撇号之后的片段.因此,foo''bar 将与 foobar 匹配.
  • 开头或结尾有多个连续撇号的单词将与该单词匹配,但不包含撇号.因此,''foo 将与 foo''foo''foo 匹配.
  • Words with an apostrophe at the beginning and at the end would be matched to the word, without the apostrophes. So 'foo' would be matched to foo.
  • Words with more than one contiguous apostrophe in the middle would be resolved to two different words: the fragment before the contiguous apostrophes and the fragment after the contiguous apostrophes. So, foo''bar would be matched to foo and bar.
  • Words with more than one contiguous apostrophe at the beginning or at the end would be matched to the word, without the apostrophes. So, ''foo would be matched to foo and ''foo'' to foo.

示例这些将逐字匹配:

  • '回合
  • 就是
  • 人的

但这些将被忽略:

  • '
  • ''

并且,对于 'open'open 将被匹配.

And, for 'open', open would be matched.

推荐答案

试试这个:

(?=.*\w)^(\w|')+$

'bout     # pass
it's      # pass
persons'  # pass
'         # fail
''        # fail

正则表达式说明

NODE      EXPLANATION
  (?=       look ahead to see if there is:
    .*        any character except \n (0 or more times
              (matching the most amount possible))
    \w        word characters (a-z, A-Z, 0-9, _)
  )         end of look-ahead
  ^         the beginning of the string
  (         group and capture to \1 (1 or more times
            (matching the most amount possible)):
    \w        word characters (a-z, A-Z, 0-9, _)
   |         OR
    '         '\''
  )+        end of \1 (NOTE: because you're using a
            quantifier on this capture, only the LAST
            repetition of the captured pattern will be
            stored in \1)
  $         before an optional \n, and the end of the
            string

这篇关于正则表达式匹配单词和带撇号的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆