使用DOMDocument解析带有JS代码的HTML [英] Using DOMDocument to Parse HTML with JS code

查看:112
本文介绍了使用DOMDocument解析带有JS代码的HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将HTML作为字符串,然后解析它以将所有href链接更改为其他内容.但是,当HTML页面具有一些JS脚本标签,即<script>时,它将被删除!例如以下行:

I take HTML in as a string and then I parse it to change all href links to something else. This works however, when the HTML page has some JS script tags i.e. <script> it gets removed! For example this line:

<script type="text/javascript" src="/js/jquery.js"></script>

更改为:

[removed][removed] 

但是,我想保留所有内容.这是我的功能:

However, I would like to keep everything in. This is my function:

function parse_html_code($code, $code_id){

libxml_use_internal_errors(true);

$xml = new DOMDocument();

$xml->loadHTML($code); 

foreach($xml->getElementsByTagName('a') as $link) {

  $link->setAttribute('href', CLK_BASE."clk.php?i=$code_id&j=" . $link->getAttribute('href'));

}

return $xml->saveHTML();

}

我对此表示感谢.

推荐答案

CodeIgniter虚假的反XSS功能"在DOMDocument看到之前就对脚本的输入进行了欺骗.脚本标签和其他各种字符串将被删除,并以"[removed]"替换,否则就没有充分的理由.有关令人尴尬的详细信息,请参见system/libraries/Security.php模块.

CodeIgniter's bogus anti-XSS ‘feature’ is mauling your script's input before DOMDocument gets a look at it. Script tags and various other strings will be removed, replaced with "[removed]" other otherwise messed-about with for no good reason. See the system/libraries/Security.php module for the full embarrassing details.

要关闭此误导功能,请设置$config['global_xss_filtering']= FALSE.当然,您必须确保脚本实际上在正确地处理字符串转义(例如,当包含在页面中时,始终使用HTML转义的用户输入).但是,无论如何,您都必须这样做. anti-XSS不能解决您的文本处理问题,只会掩盖它们.

To turn off this misguided feature, set $config['global_xss_filtering']= FALSE. You'll have to make sure your script is actually handling string escaping properly, of course (eg always HTML-escaping user input when including in a page). But then you have to do that anyway; anti-XSS doesn't fix your text processing problems, it just obscures them.

$link->setAttribute('href', CLK_BASE."clk.php?i=$code_id&j=" . $link->getAttribute('href'));

您需要urlencodegetAttribute('href')(如果不是数字或其他内容,则可能还有$ code_id).

You'll need to urlencode that getAttribute('href') (and potentially $code_id if it's not just numeric or something).

这篇关于使用DOMDocument解析带有JS代码的HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆