echo innerHTML,没有外部节点标签 [英] echo innerHTML, without outer node tags

查看:127
本文介绍了echo innerHTML,没有外部节点标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 DOMDocument 类来解析一个相当不可预测的标记字符串。这并不是很好,我需要一些数据。当然,正则表达式是正确的。

到目前为止,我已经得到了:

I'm using the DOMDocument class to parse a fairly unpredictable string of markup. It's not all that well formed and I need some data from it. Regex's are right out, of course.
So far, I've got this:

$dom = new DOMDocument;
$dom->loadHTML($str);
$contents = $dom->getElementsByTagName('body')->item(0);
echo $dom->saveXML($contents);

现在这给了我:

<body>
    <p>What I'm really after</p>
    <ul><li>Foo</li><li>Bar</li></ul>
    <h6>And so on</h6>
</body>

真正让我感到烦恼的是那些< body> 标签。我要他们走了放牧网络后,我遇到了最奇怪的解决方法。一些比其他人更骇人听闻,所以最后我定居:

What really annoys me are those <body> tags. I want them gone. After grazing the web, I've stumbled across the weirdest workarounds. Some more hacky than others, so in the end, I settled for:

echo substr($dom->saveXML($contents), 6, -7);

仍然感觉到我的黑客,但这是我能找到的最好的。有没有更可靠的方式获取DOM的innerHTML,从给定的节点开始,没有相应的标签实际显示?

Still feels hacky to me, but it's the best I could find. Is there a more reliable way of getting the innerHTML of the DOM, starting from a given node, without the corresponding tags actually showing up?

我已经看到使用正则表达式的建议(一个不,不是IMHO),甚至循环所有的孩子,回应那些有自己的孩子节点,并把那些没有的人串起来:

I've seen suggestions using regex's (a no-no IMHO), or even looping through all the children, echoing those that have childNodes of their own, and stringing together those that don't:

if ($contents->hasChildNodes())
{
    $children = $contents->getElementsByTagName('*');
    foreach($children as $child)
    {
        if ($child->hasChildNodes() || $child->nodeName === 'br')
        {//or isset($standaloneNodes[$child->nodeName])
            echo $dom->saveXML($child);
            continue;
        }

        echo '<'.$child->nodeName.'>'.$child->nodeValue.'</'.$child->nodeName.'>';
    }
}

但是,对我来说,似乎更荒唐。

But that, to me, seems even more absurd...

推荐答案

导出HTML时,必须有一个根元素。在大多数情况下,最有用的是正文。由于您正在加载HTML片段,因此确定它不会有任何属性,因此 substr(...,6,-7)是完美可预测和罚款。

When exporting HTML, you must have a single root element. In most cases, the most useful one is the body. Since you're loading in an HTML fragment, you know for certain that it won't have any attributes, therefore the substr(...,6,-7) is perfectly predictable and fine.

这篇关于echo innerHTML,没有外部节点标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆