在 Unicode 中查找相似的 ASCII 字符 [英] Find similar ASCII character in Unicode

查看:35
本文介绍了在 Unicode 中查找相似的 ASCII 字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道在 Unicode 中查找与 ASCII 字符相似的字符的简单方法吗?一个例子是西里尔小写字母 DZE (S)".我想搜索并替换相似的字符.类似的意思是人类可读的.光看是看不出区别的.

Does someone know a easy way to find characters in Unicode that are similar to ASCII characters. An example is the "CYRILLIC SMALL LETTER DZE (ѕ)". I'd like to do a search and replace for similar characters. By similar I mean human readable. You can't see a difference by looking at it.

推荐答案

正如其他评论者所指出的,Unicode 规范化(兼容性字符")在这里不会帮助你,因为你不是在寻找官方的等价物,而是在寻找字形(字母形状)的相似性.(链接的 Unicode 技术报告仍然值得一读,因为它写得非常好.)

As noted by other commenters, Unicode normalisation ("compatibilty characters") isn't going to help you here as you aren't looking for official equivalences but for similarities in glyphs (letter shapes). (The linked Unicode Technical Report is still worth reading, though, as it is extremely well written.)

如果我是你,为了免除你自己组装字符列表的繁琐工作,我会在 同形异义词攻击:这是一种通过显示包含域名的 URL 来恶意误导网络用户的方法,其中某些字母已被替换为视觉上相似的字母.另一个关于安全性的Unicode 技术报告 包含有关该问题的部分.还有——这可能是你最需要的——一个 "confusables" 表.这是另一篇主要标点符号的文章,其中一些是 ASCII,它们在 非 ASCII 码表.

If I were you, to spare you the tedious work of assembling a list of characters yourself, I'd search for resources on homograph attacks: This is a method of maliciously misleading web users by displaying URLs containing domain names in which some letters have been replaced with visually similar letters. Another Unicode Technical Report, on security, contains a section on the problem. There is also -- and that may be what you most need -- a "confusables" table. Here's another article with mainly punctuation marks, some of which ASCII, that have visually similar counterparts in the non-ASCII code tables.

我确实希望您不会提出构建此类攻击的问题.

What I do hope is that you aren't asking the question to construct such an attack.

这篇关于在 Unicode 中查找相似的 ASCII 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆