php:parse html:从body提取脚本标签,然后在< / body> [英] php : parse html : extract script tags from body and inject before </body>?
问题描述
我不在乎图书馆是什么,但是我需要一种方法来从页面的< .body。> 中提取< .script。>元素(作为字符串)。然后我想在< ./ body。>之前插入提取的< .script。>。
I don't care what the library is, but I need a way to extract <.script.> elements from the <.body.> of a page (as string). I then want to insert the extracted <.script.>s just before <./body.>.
理想情况下,我想提取< .script。> s分为2种类型;
1)外部(具有src属性的)
2)嵌入式(代码在< .script。>< ./ script之间) 。])
Ideally, I'd like to extract the <.script.>s into 2 types;
1) External (those that have the src attribute)
2) Embedded (those with code between <.script.><./script.>)
到目前为止,我已经尝试过phpDOM,Simple HTML DOM和Ganon。
我没有任何运气(我可以找到链接并删除/打印它们 - 但是每次都使用脚本失败!)。
So far I've tried with phpDOM, Simple HTML DOM and Ganon.
I've had no luck with any of them (I can find links and remove/print them - but fail with scripts every time!).
替代
https://stackoverflow.com/questions/23414887 / php-simple-html-dom-strip-scripts-and-append-to-bottom-of-body
(抱歉转发,但已经24小时尝试和失败,使用替代库,失败更多等)。
Alternative to
https://stackoverflow.com/questions/23414887/php-simple-html-dom-strip-scripts-and-append-to-bottom-of-body
(Sorry to repost, but it's been 24 Hours of trying and failing, using alternative libs, failing more etc.).
根据@ alreadycoded.com上可爱的RegEx答案,我设法吞噬togeth呃以下;
Based on the lovely RegEx answer from @alreadycoded.com, I managed to botch together the following;
$output = "<html><head></head><body><!-- Your stuff --></body></html>"
$content = '';
$js = '';
// 1) Grab <body>
preg_match_all('#(<body[^>]*>.*?<\/body>)#ims', $output, $body);
$content = implode('',$body[0]);
// 2) Find <script>s in <body>
preg_match_all('#<script(.*?)<\/script>#is', $content, $matches);
foreach ($matches[0] as $value) {
$js .= '<!-- Moved from [body] --> '.$value;
}
// 3) Remove <script>s from <body>
$content2 = preg_replace('#<script(.*?)<\/script>#is', '<!-- Moved to [/body] -->', $content);
// 4) Add <script>s to bottom of <body>
$content2 = preg_replace('#<body(.*?)</body>#is', '<body$1'.$js.'</body>', $content2);
// 5) Replace <body> with new <body>
$output = str_replace($content, $content2, $output);
哪个工作不是那么慢(几分一秒)
Which does the job, and isn't that slow (fraction of a second)
羞耻没有一个DOM的东西正在工作(或者我不能通过naffed对象和操纵进行)。
Shame none of the DOM stuff was working (or I wasn't up to wading through naffed objects and manipulating).
推荐答案
$js = "";
$content = file_get_contents("http://website.com");
preg_match_all('#<script(.*?)</script>#is', $content, $matches);
foreach ($matches[0] as $value) {
$js .= $value;
}
$content = preg_replace('#<script(.*?)</script>#is', '', $content);
echo $content = preg_replace('#<body(.*?)</body>#is', '<body$1'.$js.'</body>', $content);
这篇关于php:parse html:从body提取脚本标签,然后在< / body>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!