删除脚本和样式标记中的所有内容 [英] Remove everything within script and style tags
问题描述
我有一个名为 $ articleText
的变量,它包含html代码。在< script>
内有脚本
和样式
和< style>
html元素。我想扫描 $ articleText
并删除这些代码段。如果我还可以删除实际的HTML元素< script>
,< / script>
,< style>
和< / style>
,我也会这样做。
我想我需要使用正则表达式,但我不熟练它。
任何人都可以协助吗?
我希望我可以提供一些代码,但是像我说的那样,我不擅长正则表达式,所以我没有任何东西可以显示。
我不能使用DOM 。我需要专门针对这些特定标记使用正则表达式。 PHP提供了一个解析DOM结构的工具,称为相应的DomDocument。
<?php
// some HTML for例如
$ myHtml ='< html>< head>< script> alert(hi mom!);< / script>< / head>< body>< style> body { < / h1>< />< / style>< h1>这是一些内容< / h1>< p>内容是很棒的< / p>< / body>< script src =someFile.js> < /脚本>< / HTML>';
//创建一个新的DomDocument对象
$ doc = new DOMDocument();
//将HTML加载到DomDocument对象中(这将是您的源代码HTML)
$ doc-> loadHTML($ myHtml);
removeElementsByTagName('script',$ doc);
removeElementsByTagName('style',$ doc);
removeElementsByTagName('link',$ doc);
//输出已清除的html
echo $ doc-> saveHtml();
function removeElementsByTagName($ tagName,$ document){
$ nodeList = $ document-> getElementsByTagName($ tagName);
for($ nodeIdx = $ nodeList-> length; - $ nodeIdx> = 0;){
$ node = $ nodeList-> item($ nodeIdx);
$ node-> parentNode-> removeChild($ node);
你可以在这里试试:https://eval.in/private/4f225fa0dcb4eb
文档
-
DomDocument
- http://php.net/manual/en/class.domdocument.php -
DomNodeList
- http://php.net/manual/en/class.domnodelist.php -
DomDocument :: getElementsByTagName
- http://us3.php.net/manual/en/domdocument.getelementsbytagname.php
I have a variable named $articleText
and it contains html code. There are script
and style
codes within <script>
and <style>
html elements. I want to scan the $articleText
and remove these pieces of code. If I can also remove the actual html elements <script>
, </script>
, <style>
and </style>
, I would do that too.
I imagine I need to be using regex however I am not skilled in it.
Can anyone assist?
I wish I could provide some code but like I said I am not skilled in regex so I don't have anything to show.
I cannot use DOM. I need specifically to use regex against these specific tags
Do not use RegEx on HTML. PHP provides a tool for parsing DOM structures, called appropriately DomDocument.
<?php
// some HTML for example
$myHtml = '<html><head><script>alert("hi mom!");</script></head><body><style>body { color: red;} </style><h1>This is some content</h1><p>content is awesome</p></body><script src="someFile.js"></script></html>';
// create a new DomDocument object
$doc = new DOMDocument();
// load the HTML into the DomDocument object (this would be your source HTML)
$doc->loadHTML($myHtml);
removeElementsByTagName('script', $doc);
removeElementsByTagName('style', $doc);
removeElementsByTagName('link', $doc);
// output cleaned html
echo $doc->saveHtml();
function removeElementsByTagName($tagName, $document) {
$nodeList = $document->getElementsByTagName($tagName);
for ($nodeIdx = $nodeList->length; --$nodeIdx >= 0; ) {
$node = $nodeList->item($nodeIdx);
$node->parentNode->removeChild($node);
}
}
You can try it here: https://eval.in/private/4f225fa0dcb4eb
Documentation
DomDocument
- http://php.net/manual/en/class.domdocument.phpDomNodeList
- http://php.net/manual/en/class.domnodelist.phpDomDocument::getElementsByTagName
- http://us3.php.net/manual/en/domdocument.getelementsbytagname.php
这篇关于删除脚本和样式标记中的所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!