电子邮件网络钓鱼中的同形文字攻击检测 [英] Homoglyph attack detection in email phishing

查看:40
本文介绍了电子邮件网络钓鱼中的同形文字攻击检测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

主要问题

我正在使用Java开发的API,该API需要检测网络钓鱼电子邮件中品牌(例如PayPal,Mastercard等)的使用.

I am working on an API in Java that needs to detect the use of brands (e.g. PayPal, Mastercard etc.) in phishing emails.

很明显,攻击者使用了不同的策略来锁定这些品牌,因此它们很难被发现.例如," rnastercard "看起来与" mastercard "非常相似,并且可以欺骗毫无戒心的用户.

Obviously there are different strategies that the attackers use to target these brands so that they are harder to detect. For instance "rnastercard" looks very similar to "mastercard" and can fool an unsuspecting user.

这时,我可以使用模糊字符串搜索的形式轻松检测这些品牌的拼写错误.但是,我面临的问题是,当攻击者使用同形异义字来更改特定品牌的名称但保持相同的视觉解释时.

At this time I can easily detect the misspellings of these brands using a form of fuzzy string search. However the problem I am facing is when the attacker uses homoglyps to change the name of a particular brand but maintains the same visual interpretation.

一个象形文字攻击将 [a-zA-Z] 模式中的一个字符替换为一个看起来相似但在此范围之外的字符.例如,使用特定字符集的攻击者可以使用看起来像P的希腊字母RHO 来定位PayPal.在这种攻击中,贝宝(PayPal)品牌名称将变为:

A homoglyph attack substitutes a character from the [a-zA-Z] pattern with a character that looks similar but is outside this range. For example, an attacker using a particular character set can use the Greek Letter RHO that looks like P to target PayPal. The PayPal brand name in this sort of attack would become :

[希腊字符RHO] [a] [y] [希腊字符RHO] [a] [l]

由于我对Unicode或ISO标准及其编码等不同标准几乎没有经验,因此请您提出建议.有没有办法以编程方式确定[a-zA-Z]集合外的字符的视觉等效项,以便结果将成为[a-zA-Z]内的字符设置?

Since I have little to no experience with different standards like Unicode or ISO standards and their encodings I am calling upon your advice. Is there a way to programmatically determine the visual equivalent of a character outside the [a-zA-Z] set so that the result would be a character inside the [a-zA-Z] set?

您的某些答案可能基于特定的字符集,我正在寻找一种解决方案,以帮助我确定电子邮件中可用的每个字符集的此类表示形式.

Some of your answers might be based on a particular character set, I am looking for a solution that would help me determine such representations for every character set usable inside an email.

我尚未阅读用于邮件交换的RFC标准,但它们已列在我的清单上,我现在在问这个问题,以节省时间.

I have not read the RFC standards for mail exchange but they are on my list, I am asking this question now to save time.

可能但不可行的解决方案

我已经考虑过一些解决方案,但是它们对我的特定情况不起作用,因为它们非常占用CPU且具有类似hack的性质(请阅读可能容易破解").

I have thought of some solutions but they are not workable for my particular case since they are very CPU intensive and of a hack-like nature (read "may be easily broken").

第一种解决方案是将形式中在[a-zA-Z] 之外的字符写入图像,并将该图像提供给OCR API以使其最接近[a-zA-Z] 表示形式.

The first solution would be to write the character that is outside [a-zA-Z] in it's form into an image and feed that image to an OCR API to get it's closest [a-zA-Z] representation.

第二种解决方案是为每个字符集创建一个映射,该映射的键将是字符本身,其值将是等效的 [a-zA-Z] .该地图要么必须手动完成,要么使用上述第一种解决方案.

The second solution would be to create a map for each character set, the key of the map would be the character itself and the value would be it's [a-zA-Z] equivalent. This map would either have to be done by hand or by using the first solution described above.

其他详细信息

我已经在此处提出了这个问题.但是,尽管我进行了编辑工作,但问题仍然没有解决.可能是因为我的自我表达不好,并且没有正确标记问题.

I have already asked this question here. However the question remained closed despite my editing efforts. Probably because I didn't express myself well and I have not tagged the question properly.

在这个特定的问题中,我还解决了我对Java使用的字符集所产生的一些疑虑,这些字符集笼罩了实际的问题.但是,如果您认为需要在回答中包含此类信息,我将不胜感激,因为这将节省我研究此类问题的时间.象形文字攻击问题和Java或* javax.mail.**中的字符集问题是相互独立的,但相互联系.

In that particular question I also addressed some concerns I had regarding the character sets used by Java which clouded the actual question. However if you feel the need to include such information in your answer I would be grateful since it would save me some time from researching such questions. The question of homoglyph attacks and the question of character sets in Java or *javax.mail.** are separate but linked.

这封电子邮件是主要问题中描述的同形文字攻击的一个特定示例.谨防!这就是使用这种特定攻击方法的网络钓鱼电子邮件的实际内容,因此请不要关注该电子邮件中包含的任何链接.

As a particular example of a homoglyph attack as described in the main question is this email. BEWARE! That is the actual content of a phishing email using this particular attack method so do not follow any link contained in that email.

我已经用我认为合适的标签标记了这个问题,如果您不同意,请对该问题进行编辑,而不是将其关闭.

I've tagged this question with what I thought would be the appropriate tags, if you disagree please provide an edit to this question rather than vote it closed.

推荐答案

作为 TR-39 的一部分Unicode联盟维护着一个可疑物品列表,您可以使用它来帮助您映射.我无法证明其完整性.

As part of TR-39 the Unicode consortium maintains a list of confusables that you can use to help your mapping. I can't testify to its completeness.

TR-39 还描述了一种框架算法,用于比较使用可混淆列表的可混淆字符串.您是算法的GoLang 实现,我已经写了一个快速的

TR-39 also describes a skeleton algorithm to compare confusable strings that uses the list of confusables. Thee is A GoLang implementation of the algorithm and I've written a quick java port.

除此以外,删除变音符号和大写字母也将有所帮助.骨架算法未将其标准化.因此,整个过程应类似于骨架->删除变音符号--->使其小写.

Aside from this removing diacritics and upper case will also help. These are not normalized by the skeleton algorithm. So the full process should be something like skeleton --> remove diacritics ---> to lower case.

/*
 * Special regular expression character ranges relevant for simplification
 * -> see http://docstore.mik.ua/orelly/perl/prog3/ch05_04.htm
 * InCombiningDiacriticalMarks: special marks that are part of "normal" ä,
 * ö, î etc.. IsSk: Symbol, Modifier see
 * http://www.fileformat.info/info/unicode/category/Sk/list.htm IsLm:
 * Letter, Modifier see
 * http://www.fileformat.info/info/unicode/category/Lm/list.htm
 */
private static final Pattern DIACRITICS_AND_FRIENDS = Pattern.compile("[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+");

private static String stripDiacritics(String str) {
    str = Normalizer.normalize(str, Normalizer.Form.NFD);
    str = DIACRITICS_AND_FRIENDS.matcher(str).replaceAll("");
    return str;
}

这篇关于电子邮件网络钓鱼中的同形文字攻击检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆