echo innerHTML,没有外部节点标签 [英] echo innerHTML, without outer node tags
问题描述
我正在使用 DOMDocument
类来解析一个相当不可预测的标记字符串。这并不是很好,我需要一些数据。当然,正则表达式是正确的。
到目前为止,我已经得到了:
I'm using the DOMDocument
class to parse a fairly unpredictable string of markup. It's not all that well formed and I need some data from it. Regex's are right out, of course.
So far, I've got this:
$dom = new DOMDocument;
$dom->loadHTML($str);
$contents = $dom->getElementsByTagName('body')->item(0);
echo $dom->saveXML($contents);
现在这给了我:
<body>
<p>What I'm really after</p>
<ul><li>Foo</li><li>Bar</li></ul>
<h6>And so on</h6>
</body>
真正让我感到烦恼的是那些< body>
标签。我要他们走了放牧网络后,我遇到了最奇怪的解决方法。一些比其他人更骇人听闻,所以最后我定居:
What really annoys me are those <body>
tags. I want them gone. After grazing the web, I've stumbled across the weirdest workarounds. Some more hacky than others, so in the end, I settled for:
echo substr($dom->saveXML($contents), 6, -7);
仍然感觉到我的黑客,但这是我能找到的最好的。有没有更可靠的方式获取DOM的innerHTML,从给定的节点开始,没有相应的标签实际显示?
Still feels hacky to me, but it's the best I could find. Is there a more reliable way of getting the innerHTML of the DOM, starting from a given node, without the corresponding tags actually showing up?
我已经看到使用正则表达式的建议(一个不,不是IMHO),甚至循环所有的孩子,回应那些有自己的孩子节点,并把那些没有的人串起来:
I've seen suggestions using regex's (a no-no IMHO), or even looping through all the children, echoing those that have childNodes of their own, and stringing together those that don't:
if ($contents->hasChildNodes())
{
$children = $contents->getElementsByTagName('*');
foreach($children as $child)
{
if ($child->hasChildNodes() || $child->nodeName === 'br')
{//or isset($standaloneNodes[$child->nodeName])
echo $dom->saveXML($child);
continue;
}
echo '<'.$child->nodeName.'>'.$child->nodeValue.'</'.$child->nodeName.'>';
}
}
但是,对我来说,似乎更荒唐。
But that, to me, seems even more absurd...
推荐答案
导出HTML时,必须有一个根元素。在大多数情况下,最有用的是正文
。由于您正在加载HTML片段,因此确定它不会有任何属性,因此 substr(...,6,-7)
是完美可预测和罚款。
When exporting HTML, you must have a single root element. In most cases, the most useful one is the body
. Since you're loading in an HTML fragment, you know for certain that it won't have any attributes, therefore the substr(...,6,-7)
is perfectly predictable and fine.
这篇关于echo innerHTML,没有外部节点标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!