使用 SimpleXML 解析命名空间,而不考虑结构或命名空间 [英] Resolve namespaces with SimpleXML regardless of structure or namespace

查看:35
本文介绍了使用 SimpleXML 解析命名空间,而不考虑结构或命名空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到了这样的 Google 购物提要(摘录):

I got a Google Shopping feed like this (extract):

<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">  
...
<g:id><![CDATA[Blah]]></g:id>
<title><![CDATA[Blah]]></title>
<description><![CDATA[Blah]]></description>
<g:product_type><![CDATA[Blah]]></g:product_type>

现在,SimpleXML 可以读取title"和description"标签,但无法读取带有g:"前缀的标签.

Now, SimpleXML can read the "title" and "description" tags but it can't read the tags with "g:" prefix.

对于这种特定情况,stackoverflow 上有解决方案,使用children"函数.但我不仅想阅读 Google 购物 XML,我还需要它与结构或命名空间无关,我对文件一无所知(我以多维数组的形式递归遍历节点).

There are solutions on stackoverflow for this specific case, using the "children" function. But I don't only want to read Google Shopping XMLs, I need it to be undependend from structure or namespace, I don't know anything about the file (I recursively loop through the nodes as an multidimensional array).

有没有办法用 SimpleXML 做到这一点?我可以替换冒号,但我希望能够存储数组并重新组合 XML(在本例中专门用于 Google 购物),因此我不想丢失信息.

Is there a way to do it with SimpleXML? I could replace the colons, but I want to be able to store the array and reassemble the XML (in this case specifically for Google Shopping) so I do not want to lose information.

推荐答案

您想使用 SimpleXMLElement 从 XML 中提取数据并将其转换为数组.

You want to use SimpleXMLElement to extract data from XML and convert it into an array.

这通常是可能的,但有一些警告.在 XML 命名空间之前,您的 XML 带有 CDATA.对于使用 Simplexml 将 XML 转换为数组,您需要在加载 XML 字符串时将 CDATA 转换为文本.这是通过 LIBXML_NOCDATA 标志完成的.示例:

This is generally possible but comes with some caveats. Before XML Namespaces your XML comes with CDATA. For XML to array conversion with Simplexml you need to convert CDATA to text when you load the XML string. This is done with the LIBXML_NOCDATA flag. Example:

$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA);
print_r($xml); // print_r shows how SimpleXMLElement does array conversion

这会为您提供以下输出:

This gives you the following output:

SimpleXMLElement Object
(
    [@attributes] => Array
        (
            [version] => 2.0
        )

    [title] => Blah
    [description] => Blah
)

正如您已经看到的,没有一种很好的形式来表示数组中的属性,因此 Simplexml 按照惯例将这些放在 @attributes 键中.

As you can already see, there is no nice form to present the attributes in an array, therefore Simplexml by convention puts these into the @attributes key.

您遇到的另一个问题是处理这些多个 XML 名称空间.在前面的示例中,没有使用特定的命名空间.那是 default 命名空间.将 SimpleXMLElement 转换为数组时,将使用 SimpleXMLElement 的命名空间.由于没有明确指定,default 命名空间已被采用.

The other problem you have is to handle those multiple XML namespaces. In the previous example no specific namespace was used. That is the default namespace. When you convert a SimpleXMLElement to an array, the namespace of the SimpleXMLElement is used. As none was explicitly specified, the default namespace has been taken.

但是如果你在创建数组时指定了一个命名空间,那个命名空间就会被占用.

But if you specify a namespace when you create the array, that namespace is taken.

示例:

$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA, "http://base.google.com/ns/1.0");
print_r($xml);

这为您提供以下输出:

SimpleXMLElement Object
(
    [id] => Blah
    [product_type] => Blah
)

如您所见,这次在数组转换中使用了创建 SimpleXMLElement 时指定的命名空间:http://base.google.com/ns/1.0.

As you can see, this time the namespace that has been specified when the SimpleXMLElement was created is used in the array conversion: http://base.google.com/ns/1.0.

在编写时,您希望将文档中的所有名称空间都考虑在内,您需要先获取这些名称 - 包括默认名称:

As you write you want to take all namespaces from the document into account, you need to obtain those first - including the default one:

$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA);
$namespaces = [null] + $xml->getDocNamespaces(true);

然后您可以遍历所有命名空间并递归地将它们合并到同一个数组中如下图:

Then you can iterate over all namespaces and recursively merge them into the same array shown below:

$array = [];
foreach ($namespaces as $namespace) {
    $xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA, $namespace);
    $array = array_merge_recursive($array, (array) $xml);
}
print_r($array);

这最后应该创建并输出您选择的数组:

This then finally should create and output the array of your choice:

Array
(
    [@attributes] => Array
        (
            [version] => 2.0
        )

    [title] => Blah
    [description] => Blah
    [id] => Blah
    [product_type] => Blah
)

如您所见,使用 SimpleXMLElement 完全可以做到这一点.但是,了解 SimpleXMLElement 如何转换为数组(或序列化为遵循相同规则的 JSON)很重要.要模拟 SimpleXMLElement 到数组的转换,您可以使用 print_r 进行快速输出.

As you can see, this is perfectly possible with SimpleXMLElement. However it's important you understand how SimpleXMLElement converts into an array (or serializes to JSON which does follow the same rules). To simulate the SimpleXMLElement-to-array conversion, you can make use of print_r for a quick output.

请注意,并非所有 XML 构造都可以同样好地转换为数组.这并不是 Simplexml 的具体限制,而是在于 XML 可以表示哪些结构以及数组可以表示哪些结构的性质.

Note that not all XML constructs can be equally well converted into an array. That's not specifically a limitation of Simplexml but lies in the nature of which structures XML can represent and which structures an array can represent.

因此,通常最好将 XML 保存在像 SimpleXMLElement(或 DOMDocument)这样的对象中以访问和处理数据 - 而不是使用数组.

Therefore it is most often better to keep the XML inside an object like SimpleXMLElement (or DOMDocument) to access and deal with the data - and not with an array.

但是,只要您知道自己在做什么,并且不需要编写大量代码来访问结构中树更深处的成员,那么将数据转换为数组是完全可以的.否则 SimpleXMLElement 比数组更受欢迎,因为它不仅允许专用访问许多 XML 功能,还允许像使用 SimpleXMLElement::xpath 方法.您需要编写多行自己的代码来访问 XML 树中适合数组的数据.

However it's perfectly fine to convert data into an array as long as you know what you do and you don't need to write much code to access members deeper down the tree in the structure. Otherwise SimpleXMLElement is to be favored over an array because it allows dedicated access not only to many of the XML feature but also querying like a database with the SimpleXMLElement::xpath method. You would need to write many lines of own code to access data inside the XML tree that comfortable on an array.

为了两全其美,您可以扩展SimpleXMLElement以满足您的特定转换需求:

To get the best of both worlds, you can extend SimpleXMLElement for your specific conversion needs:

$buffer = <<<BUFFER
<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
...
<g:id><![CDATA[Blah]]></g:id>
<title><![CDATA[Blah]]></title>
<description><![CDATA[Blah]]></description>
<g:product_type><![CDATA[Blah]]></g:product_type>
</rss>
BUFFER;

$feed = new Feed($buffer, LIBXML_NOCDATA);
print_r($feed->toArray());

输出:

Array
(
    [@attributes] => stdClass Object
        (
            [version] => 2.0
        )

    [title] => Blah
    [description] => Blah
    [id] => Blah
    [product_type] => Blah
    [@text] => ...
)

对于底层实现:

class Feed extends SimpleXMLElement implements JsonSerializable
{
    public function jsonSerialize()
    {
        $array = array();

        // json encode attributes if any.
        if ($attributes = $this->attributes()) {
            $array['@attributes'] = iterator_to_array($attributes);
        }

        $namespaces = [null] + $this->getDocNamespaces(true);
        // json encode child elements if any. group on duplicate names as an array.
        foreach ($namespaces as $namespace) {
            foreach ($this->children($namespace) as $name => $element) {
                if (isset($array[$name])) {
                    if (!is_array($array[$name])) {
                        $array[$name] = [$array[$name]];
                    }
                    $array[$name][] = $element;
                } else {
                    $array[$name] = $element;
                }
            }
        }

        // json encode non-whitespace element simplexml text values.
        $text = trim($this);
        if (strlen($text)) {
            if ($array) {
                $array['@text'] = $text;
            } else {
                $array = $text;
            }
        }

        // return empty elements as NULL (self-closing or empty tags)
        if (!$array) {
            $array = NULL;
        }

        return $array;
    }

    public function toArray() {
        return (array) json_decode(json_encode($this));
    }
}

这是对PHP 中的简单 XML 和 JSON 编码 – 第三部分和结尾.

Which is an adoption with namespaces of the Changing JSON Encoding Rules example given in SimpleXML and JSON Encode in PHP – Part III and End.

这篇关于使用 SimpleXML 解析命名空间,而不考虑结构或命名空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆