如何以编程方式让所有字符串的 unicode 实体自行解析? [英] How can I programmatically get all of a string's unicode entities to resolve themselves?

查看:57
本文介绍了如何以编程方式让所有字符串的 unicode 实体自行解析?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试减轻 XSS.我怎样才能避免这种情况:

I'm trying to mitigate XSS. How can I shield from this:

j&#X41vascript:alert('test2')

在链接的href中?

我尝试了以下方法,但它只是将上述字符串的未解析字面值分配为 href 的相对路径,而不是能够触发代码执行的正确 javascript: href.我想知道攻击者如何利用这一点.

I've tried the following, but it just assigns the literal, unresolved value of that above string as a relative path of the href, not a proper javascript: href capable of triggering code execution. I'm wondering how an attacker might be able to exploit this.

我尝试了以下方法:

a = document.createElement('a');

然后是:

a.href = 'j&#X41vascript:alert('test2')';

还有这个:

a.setAttribute('href', "j&#X41vascript:alert('test2')");

但两者都返回 "j&#X41vascript:alert('test2')" 然后查询 a.href,而不是想要的(或不想要的,取决于您的透视) javascript:alert('test2');

But both return "j&#X41vascript:alert('test2')" upon then querying a.href, not the desired (or undesired, depending on your perspective) javascript:alert('test2');

如果我可以解析所有实体,那么我可以在结果字符串中解析出所有出现的 javascript: 并且安全——对吗?

If I can get all the entities to resolve, then I can parse out all occurrences of javascript: in the resulting string, and be safe -- right?

我在想的另一件事是,如果有人这样做了 jvascript:steal_cookie();.我的意思是,理论上,它们可以有无限级的递归,最终都会解决,对吗?

The other thing I was thinking was that what if someone does jvascript:steal_cookie();. I mean, theoretically, they could have infinite levels of recursion, and it would all ultimately resolve, right?

function resolve_entities(str) {
  var s = document.createElement('span')
    , nestTally = str.match(/&/) ? 0 : 1
    , limit = 5
    , limitReached = false;

  s.innerHTML = str;
  while (s.textContent.match(/&/)) {
    s.innerHTML = s.textContent;
    if(nestTally++ >= limit) {
      limitReached = true;
      break;
    }
  }

  return s.textContent;
}

推荐答案

XML/HTML 字符实体,如 A& 被解码 当包含它们的字符串被解析为 XML 或 HTML 时.通常,当它们作为 HTML 页面的一部分从服务器发送到浏览器时会发生这种情况,尽管还有其他情况(例如在 JavaScript 中分配给 element.innerHTML)可能导致字符串被解析为 XML 或 HTML.

XML/HTML character entities like A or & are decoded when the string containing them is parsed as XML or HTML. Typically, this happens when they are sent from the server to the browser as part of an HTML page, although there are other situations (such as assigning to element.innerHTML in JavaScript) which can cause a string to be parsed as XML or HTML.

在 JavaScript 中读取或写入元素属性不会触发 XML/HTML 解析,因此不会扩展字符实体.如果你写

Reading or writing to element attributes in JavaScript does not trigger XML/HTML parsing, and thus does not expand character entities. If you write

a.href = "jAvascript:alert('test')";

那么 a 元素的 href 属性将是 jAvascript:alert('test')、&符号和全部.

then the href attribute of that a element will be jAvascript:alert('test'), ampersands and all.

需要注意的重要一点是,每当将字符串解析为 XML 或 HTML 时,字符实体都会被准确解码一次.因此,&x41; 变成 a,而 A 变成 A.它不会最终全部解决",除非您正在做一些愚蠢的事情,例如从 .textContent 读取并重复分配给 .innerHTML.

What's important to note is that, whenever a string is parsed as XML or HTML, character entities are decoded exactly once. Thus, &x41; becomes a, while A becomes A. It will not "all ultimately resolve", unless you're doing something silly like reading from .textContent and assigning to .innerHTML repeatedly.

解析完成后,完全无关输出中的任何字符序列是否看起来像 XML/HTML 字符实体 —也就是说,除非您再获取输出并再次通过 XML/HTML 解析器提供它.(这样做很少有用,通常只会由于错误而发生,例如分配给 .innerHTML 而应该分配给 .textContent.)

Once the parsing is complete, it's completely irrelevant whether any character sequences in the output might or might not look like XML/HTML character entities — that is, unless you then take the output and feed it through an XML/HTML parser again. (Doing that is very rarely useful, and usually only happens due to bugs such as assigning to .innerHTML when one should have assigned to .textContent.)

无论如何,看看评论,您说您正在编写一些客户端 JavaScript 代码,这些代码从您无法控制的服务器获取一些不受信任的数据,并且您担心只是将数据分配给 .innerHTML 可能允许 XSS 攻击.如果是这样,有两种情况:

Anyway, looking at the comments, you say you're writing some client-side JavaScript code that's getting some untrusted data from a server you don't control, and you're worried that simply assigning the data to .innerHTML could allow XSS attacks. If so, there are two cases:

  1. 您收到的数据是纯文本的.在这种情况下,您应该将它分配给 .textContent 并完成它.

您收到的数据实际上是 HTML.在这种情况下,您确实需要承担艰巨而费力的消毒工作.来自 Caja 项目的这个 JavaScript HTML 清理程序 可能会有所帮助.

The data you receive is, in fact, meant to be HTML. In that case you do need to undertake the difficult and laborious job of sanitizing it. This JavaScript HTML sanitizer from the Caja project might help.

这篇关于如何以编程方式让所有字符串的 unicode 实体自行解析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆