正则表达式以匹配某些字符并排除某些字符,但不会产生负面的前瞻性 [英] Regex to match certain characters and exclude certain characters but without negative lookahead

查看:149
本文介绍了正则表达式以匹配某些字符并排除某些字符,但不会产生负面的前瞻性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个与所有表情符号(或大多数表情符号)匹配但不包括某些字符(例如||‘|’|…|—)的正则表达式.

I want a regex that matches all emojis (or most of them) but excludes certain characters (such as "|"|‘|’|…|—).

此regex 通过负面的预见来完成这项工作:

This regex does the job via negative lookahead:

/(?!\u201C|\u201D|\u2018|\u2019|\u2026|\u2014)(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])/

但显然 Google脚本不支持此功能.错误:

But apparently Google Scripts doesn't support this. Error:

无效的正则表达式模式 (?!"|" |’||||…| —)(©|®| [-㌀] |?[퀀-?] |?[퀀-?] |?[퀀-?])

Invalid regular expression pattern (?!"|"|‘|’|…|—)(©|®|[ -㌀]|?[퀀-?]|?[퀀-?]|?[퀀-?])

还有另一种方法可以实现我的目标(与 Google脚本的findText )?

Is there another way to achieve my goal (a regex that works with Google Script's findText)?

推荐答案

选项1

也许

[\ u {1f300}-\ u {1f5ff} \ u {1f900}-\ u {1f9ff} \ u {1f600}-\ u {1f64f} \ u {1f680}-\ u {1f6ff} \ u {2600}-\ u {26ff} \ u {2700}-\ u {27bf} \ u {1f1e6}-\ u {1f1ff} \ u {1f191}-\ u {1f251} \ u {1f004} \ u { 1f0cf} \ u {1f170}-\ u {1f171} \ u {1f17e}-\ u {1f17f} \ u {1f18e} \ u {3030} \ u {2b50} \ u {2b55} \ u {2934}- \ u {2935} \ u {2b05}-\ u {2b07} \ u {2b1b}-\ u {2b1c} \ u {3297} \ u {3299} \ u {303d} \ u {00a9} \ u { 00ae} \ u {2122} \ u {23f3} \ u {24c2} \ u {23e9}-\ u {23ef} \ u {25b6} \ u {23f8}-\ u {23fa}]

[\u{1f300}-\u{1f5ff}\u{1f900}-\u{1f9ff}\u{1f600}-\u{1f64f}\u{1f680}-\u{1f6ff}\u{2600}-\u{26ff}\u{2700}-\u{27bf}\u{1f1e6}-\u{1f1ff}\u{1f191}-\u{1f251}\u{1f004}\u{1f0cf}\u{1f170}-\u{1f171}\u{1f17e}-\u{1f17f}\u{1f18e}\u{3030}\u{2b50}\u{2b55}\u{2934}-\u{2935}\u{2b05}-\u{2b07}\u{2b1b}-\u{2b1c}\u{3297}\u{3299}\u{303d}\u{00a9}\u{00ae}\u{2122}\u{23f3}\u{24c2}\u{23e9}-\u{23ef}\u{25b6}\u{23f8}-\u{23fa}]

对于所需的表情符号,可能工作正常.

might be working OK for your desired emojis.

否则,您可能想使用char类对那些不需要的char求反,例如:

Otherwise, you might want to negate those undesired chars using char classes, such as:

[these unicode ranges &&[^these unicodes]]

这将变得非常复杂,但是可能.

which would become pretty complicated, yet possible.

使用此选项,您很可能可以更轻松地解决您的问题.我想,您的问题是那些不需要的标点符号已经在所需的unicode中.检查情况是否如此.例如,在

Using this option you can most likely solve your problem much simpler. I guess, your problem is that those undesired punctuations are already among the desired unicodes. Check to see if that'd be the case. For example, in

[\u100-\u200]

您可能会将\u150\u175作为不想要的字符,您希望将它们从所需的Unicode范围中删除.

you might have \u150 and \u175 as undesired chars, which you want them to be removed from your desired ranges of unicodes that you already have.

然后您可以简单地将其从范围中删除,例如:

You can then simply remove those from the range, such as with:

[\u100-\u149\u151-\u174\u176-\u200]

要解决问题就这么简单.

and as simple as that the problem would be solved.

javascript unicode表情符号正则表达式

这篇关于正则表达式以匹配某些字符并排除某些字符,但不会产生负面的前瞻性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆