忽略 RegEx 中的不可见字符 [英] Ignoring invisible characters in RegEx

查看:93
本文介绍了忽略 RegEx 中的不可见字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个难题.

我目前正在尝试构建一个正则表达式来过滤掉一些特别讨厌的诈骗电子邮件.我相信你以前见过他们,使用来自受感染网站的数据转储来威胁要泄露私密视频.

I am currently trying to build a regex to filter out some particularly nasty scam emails. I'm sure you've seen them before, using a data dump from a compromised website to threaten to reveal intimate videos.

这一切都很好,只是我在测试正则表达式时注意到其中一些消息在单词中间插入了特殊的不可见字符.就像你在这里看到的一样(我发现很难找到一个保存这些特殊字符的地方):Regexr 链接

That's all well and good, except I noticed while testing the regex that some of these messages insert special invisible characters in the middle of words. Like you might see here (I've found it especially hard to find a place that keeps these special characters): Regexr link

我发现自己正在寻找一种方法来创建一个可能会同时忽略这些字符的正则表达式,因为有些电子邮件有它们,有些则没有.最后,我试图用类似的东西创建一个匹配

I find myself looking for a way to create a regex that might ignore these characters all together, as some emails have them and some don't. In the end, I'm trying to create a match with something like

/all (.*)your contacts

推荐答案

如果您要标记某个特定字符串,您可以执行以下操作:

If there's a particular string you're trying to flag, you could do something like this:

检测带有可选 invis 字符的电子邮件":/e[^\w]?m[^\w]?a[^\w]?i[^\w]?l/

Detect "email" with optional invis characters: /e[^\w]?m[^\w]?a[^\w]?i[^\w]?l/

[^\w]? 会检测任何不是字母或数字的东西.如果您看到在字母之间使用了多个不可见字符,您也可以使用 [^\w]*.

[^\w]? will detect anything that's not a letter or digit. You could also use [^\w]* if you're seeing more than one invisible character being used between letters.

这篇关于忽略 RegEx 中的不可见字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆