在Unicode中查找类似的ASCII字符 [英] Find similar ASCII character in Unicode

查看:90
本文介绍了在Unicode中查找类似的ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道一种简单的方法来查找与ASCII字符相似的Unicode字符.例如"西里尔小写字母DZE (s)" .我想进行搜索并替换类似的字符.类似地,我的意思是人类可读.看不到差异.

Does someone know a easy way to find characters in Unicode that are similar to ASCII characters. An example is the "CYRILLIC SMALL LETTER DZE (ѕ)". I'd like to do a search and replace for similar characters. By similar I mean human readable. You can't see a difference by looking at it.

推荐答案

正如其他评论者所指出的, Unicode归一化(兼容性字符")不会在这里为您提供帮助,因为您不是在寻找正式的对等物,而是在字形(字母形状)中寻找相似之处. (尽管链接的Unicode技术报告写得非常好,但仍然值得一读.)

As noted by other commenters, Unicode normalisation ("compatibilty characters") isn't going to help you here as you aren't looking for official equivalences but for similarities in glyphs (letter shapes). (The linked Unicode Technical Report is still worth reading, though, as it is extremely well written.)

如果我是您,为了避免您自己整理字符列表的繁琐工作,我会在 Unicode技术报告包含有关该问题的部分.还有-也许这就是您最需要的-易混淆"表.这是另一篇主要包含标点符号的文章,其中一些标点符号是ASCII,在非ASCII代码表.

If I were you, to spare you the tedious work of assembling a list of characters yourself, I'd search for resources on homograph attacks: This is a method of maliciously misleading web users by displaying URLs containing domain names in which some letters have been replaced with visually similar letters. Another Unicode Technical Report, on security, contains a section on the problem. There is also -- and that may be what you most need -- a "confusables" table. Here's another article with mainly punctuation marks, some of which ASCII, that have visually similar counterparts in the non-ASCII code tables.

我希望您不会问这个问题以构成这种攻击.

What I do hope is that you aren't asking the question to construct such an attack.

这篇关于在Unicode中查找类似的ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆