使用jQuery将标签列入白名单是否明智? JavaScript中是否有现有的解决方案? [英] Is it wise to use jQuery for whitelisting tags? Are there existing solutions in JavaScript?

查看:140
本文介绍了使用jQuery将标签列入白名单是否明智? JavaScript中是否有现有的解决方案?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想清除粘贴在RTF编辑器中的HTML(目前为FCK 1.6).清理应基于标签的白名单(可能还有其他具有属性的清单).这主要不是为了防止XSS,而是删除难看的HTML.

I want to clean HTML pasted in a rich text editor (FCK 1.6 at the moment). The cleaning should be based on a whitelist of tags (and perhaps another with attributes). This is not primarily in order to prevent XSS, but to remove ugly HTML.

目前,我看不到在服务器上执行此操作的方法,因此我猜它必须在JavaScript中完成.

Currently I see no way to do it on the server, so I guess it must be done in JavaScript.

我找到了 jquery-clean插件,但据我所知看到,它使用正则表达式来完成工作,并且

I found the jquery-clean plugin, but as far as I can see, it is using regexes to do the work, and we know that is not safe.

由于我没有找到任何其他基于JS的解决方案,因此我开始使用jQuery来实现自己的目标.通过创建粘贴的html($(pastedHtml))的jQuery版本,然后遍历结果树,通过查看属性tagName删除与白名单不匹配的每个元素,将可以正常工作.

As I've not found any other JS-based solution I've started to impement one myself using jQuery. It would work by creating a jQuery version of the pasted html ($(pastedHtml)) and then traverse the resulting tree, removing each element not matching the whitelist by looking at the attribute tagName.

  • 这更好吗?
  • 我可以信任jQuery代表粘贴的内容吗 内容不错(可能无法比拟 结束标记和您拥有什么?
  • 已经有更好的解决方案了吗 我找不到?
  • Is this any better?
  • Can I trust jQuery to represent the pasted content well (there may be unmatched ending tags and what-have-you)?
  • Is there a better solution already that I couldn't find?

这是我当前基于jQuery的解决方案(详细且未经广泛测试):

This is my current, jQuery-based, solution (verbose and not extensively tested):

function clean(element, whitelist, replacerTagName) {
    // Use div if no replace tag was specified
    replacerTagName = replacerTagName || "div";

    // Accept anything that jQuery accepts
    var jq = $(element);    

    // Create a a copy of the current element, but without its children
    var clone = jq.clone();
    clone.children().remove();

    // Wrap the copy in a dummy parent to be able to search with jQuery selectors
    // 1)
    var wrapper = $('<div/>').append(clone);

    // Check if the element is not on the whitelist by searching with the 'not' selector
    var invalidElement = wrapper.find(':not(' + whitelist + ')');

    // If the element wasn't on the whitelist, replace it.
    if (invalidElement.length > 0) {
       var el = $('<' + replacerTagName + '/>'); 
       el.text(invalidElement.text()); 
       invalidElement.replaceWith(el);   
    }

    // Extract the (maybe replaced) element
    var cleanElement = $(wrapper.children().first());

    // Recursively clean the children of the original element and
    // append them to the cleaned element
    var children = jq.children();
    if (children.length > 0) {
        children.each(function(_index, thechild) {
                          var cleaned = clean(thechild, whitelist, replacerTagName);
                          cleanElement.append(cleaned);
                      });
      } 
    return cleanElement;
}

我想知道一些要点(请参见代码中的注释);

I am wondering about some points (see comments in the code);

  1. 我真的需要将我的元素包装在一个虚拟父对象中,以便能够与jQuery的:not"匹配吗?
  2. 这是创建新节点的推荐方法吗?

推荐答案

如果您利用浏览器的HTML校正功能(例如,将富文本格式复制到空的divinnerHTML中,并使用生成的DOM树) ,将保证HTML是有效的(更正方式在某种程度上取决于浏览器).尽管无论如何,这可能都是由丰富的编辑器完成的.

If you leverage the browser's HTML correcting abilities (e.g. you copy the rich text to the innerHTML of an empty div and take the resulting DOM tree), the HTML will be guaranteed to be valid (the way it will be corrected is somewhat browser-dependent). Although this is probably done by rich editor anyways.

jQuery自己的文本顶部DOM转换可能也很安全,但绝对慢,所以我会避免它.

jQuery's own text-top DOM transform is probably also safe, but definitely slower, so I would avoid it.

使用基于jQuery选择器引擎的白名单可能有些棘手,因为在保留其子元素的同时删除元素可能会使该文档无效,因此浏览器将通过更改DOM树来对其进行纠正,这可能会使试图迭代的脚本感到困惑通过无效的元素. (例如,您允许使用ulli,但不允许使用ol;该脚本删除了列表根元素,裸露的li元素无效,因此浏览器再次将它们包装在ul中,因此将丢失ul如果您将不需要的元素与所有它们的孩子一起扔掉,我看不到有任何问题.

Using a whitelist based on the jQuery selector engine might be somewhat tricky because removing an element while preserving its children might make the document invalid, so the browser would correct it by changing the DOM tree, which might confuse a script trying to iterate through invalid elements. (E.g. you allow ul and li but not ol; the script removes the list root element, naked li elements are invalid so the browser wraps them in ul again, that ul will be missed by the cleaning script.) If you throw away unwanted elements together with all their children, I don't see any problems with that.

这篇关于使用jQuery将标签列入白名单是否明智? JavaScript中是否有现有的解决方案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆