php:parse html:从body提取脚本标签,然后在< / body&gt [英] php : parse html : extract script tags from body and inject before </body>?

查看:216
本文介绍了php:parse html:从body提取脚本标签,然后在< / body&gt的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不在乎图书馆是什么,但是我需要一种方法来从页面的< .body。> 中提取< .script。>元素(作为字符串)。然后我想在< ./ body。>之前插入提取的< .script。>。

I don't care what the library is, but I need a way to extract <.script.> elements from the <.body.> of a page (as string). I then want to insert the extracted <.script.>s just before <./body.>.

理想情况下,我想提取< .script。> s分为2种类型;

1)外部(具有src属性的)
2)嵌入式(代码在< .script。>< ./ script之间) 。])

Ideally, I'd like to extract the <.script.>s into 2 types;
1) External (those that have the src attribute) 2) Embedded (those with code between <.script.><./script.>)

到目前为止,我已经尝试过phpDOM,Simple HTML DOM和Ganon。

我没有任何运气(我可以找到链接并删除/打印它们 - 但是每次都使用脚本失败!)。

So far I've tried with phpDOM, Simple HTML DOM and Ganon.
I've had no luck with any of them (I can find links and remove/print them - but fail with scripts every time!).

替代

https://stackoverflow.com/questions/23414887 / php-simple-html-dom-strip-scripts-and-append-to-bottom-of-body

(抱歉转发,但已经24小时尝试和失败,使用替代库,失败更多等)。

Alternative to
https://stackoverflow.com/questions/23414887/php-simple-html-dom-strip-scripts-and-append-to-bottom-of-body
(Sorry to repost, but it's been 24 Hours of trying and failing, using alternative libs, failing more etc.).

根据@ alreadycoded.com上可爱的RegEx答案,我设法吞噬togeth呃以下;

Based on the lovely RegEx answer from @alreadycoded.com, I managed to botch together the following;

$output = "<html><head></head><body><!-- Your stuff --></body></html>"
$content = '';
$js = '';

// 1) Grab <body>
preg_match_all('#(<body[^>]*>.*?<\/body>)#ims', $output, $body);
$content = implode('',$body[0]);

// 2) Find <script>s in <body>
preg_match_all('#<script(.*?)<\/script>#is', $content, $matches);
foreach ($matches[0] as $value) {
    $js .= '<!-- Moved from [body] --> '.$value;
}

// 3) Remove <script>s from <body>
$content2 = preg_replace('#<script(.*?)<\/script>#is', '<!-- Moved to [/body] -->', $content); 

// 4) Add <script>s to bottom of <body>
$content2 = preg_replace('#<body(.*?)</body>#is', '<body$1'.$js.'</body>', $content2);

// 5) Replace <body> with new <body>
$output = str_replace($content, $content2, $output);

哪个工作不是那么慢(几分一秒)

Which does the job, and isn't that slow (fraction of a second)

羞耻没有一个DOM的东西正在工作(或者我不能通过naffed对象和操纵进行)。

Shame none of the DOM stuff was working (or I wasn't up to wading through naffed objects and manipulating).

推荐答案

$js = "";
$content = file_get_contents("http://website.com");
preg_match_all('#<script(.*?)</script>#is', $content, $matches);
foreach ($matches[0] as $value) {
    $js .= $value;
}
$content = preg_replace('#<script(.*?)</script>#is', '', $content); 
echo $content = preg_replace('#<body(.*?)</body>#is', '<body$1'.$js.'</body>', $content);

这篇关于php:parse html:从body提取脚本标签,然后在&lt; / body&gt的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆