PHP simplexml xpath 在包含制表符分隔文本的 ELEMENT 中搜索值吗? [英] PHP simplexml xpath search for value in an ELEMENT containing tab delimited text?

查看:25
本文介绍了PHP simplexml xpath 在包含制表符分隔文本的 ELEMENT 中搜索值吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用 PHP simplexml xpath 搜索制表符分隔的 ELEMENT 中的文本值,并从与搜索文本偏移量不同的偏移量处返回来自同一元素的文本?

假设我希望找到包含2"值的 DATA 元素并返回 LongValueAcademy".

Lets say I wish to find the DATA element containing a Value of '2' and return the LongValue 'Academy'.

xml文件格式如下

    <METADATA Resource="Property" Lookup="Area"> 
    <COLUMNS>->fieldname *(->fieldname)-></COLUMNS>
    *(<DATA>->fielddata *(->fielddata)-></DATA>) 
    </METADATA>

   Note: ignore spaces
         *()  means 1 or more
         -> is tab chr(9)

在下面的示例中,COLUMNS 元素包含三个列名称(LongValueShortValueValue),它们的顺序可以任意.

In the example below the COLUMNS element contains three column names (LongValue, ShortValue, Value), which can be in any order.

每个 DATA 元素都有 3 个对应的制表符分隔的文本值,例如下面的第一个 DATA 元素包含

Each DATA element has 3 corresponding tab delimited text values, for example the first DATA element below contains

    LongVlaue = 'Salado'  
    ShortValue = 'Sal' 
    Value = '5' 

这里是 XML 文档

<METADATA Resource="Property" Lookup="Area">
<COLUMNS>   LongValue   ShortValue  Value   </COLUMNS>
<DATA>  Salado  Sal 5   </DATA>
<DATA>  Academy Aca 2   </DATA>
<DATA>  Rogers  Rog 1   </DATA>
<DATA>  Bartlett    Bar 4   </DATA>
</METADATA>

注意:COLUMNS 和 DATA 元素具有分隔 3 列的文本制表符,其中每一列以制表符开头,后跟文本,最后是最后一个制表符

Note: the COLUMNS and DATA elements has text tab delimited for 3 columns where each column starts with a tab followed by text, then one last tab at the end

这是我的想法:

1.) 最好先从 COLUMNS 元素中找到名为 'Value' 的列的偏移量,然后再尝试从 DATA 元素中找到相应的文本,因为 'Value' 列可以按任何顺序排列,但是 DATA 中的文本元素将按此顺序排列.

1.) Preferably find the offset for the column named 'Value' from the COLUMNS element before trying to find the corresponding text from the DATA element because the ‘Value’ column can be in any order, however the text in the DATA elements will be in that order.

2.) 在 'Value' 列中搜索包含文本的 DATA 元素并从 'LongValue' 返回文本.

2.) Search for a DATA element containing text in the 'Value' column and return the text from the 'LongValue'.

这是一个 xpath 搜索的例子,它有一些可行但有缺陷,因为它没有考虑 COLUMNS 元素中 Value 列的偏移量,因此它可以正确地找到Value"的相应(正确)位置DATA 元素中的列.

Here's a example of an xpath search that some what works but is flawed because it does not take in account the offset for the Value column in the COLUMNS element so it can properly find the corresponding (correct) position of the ‘Value’ column in the DATA element.

这是一个代码片段:

$xml_text = ‘the xml document above’;
$xml = simplexml_load_string($xml_text); //load the xml document
$resource = 'Property'; //value for the Resource attribute METADATA.
$lookup = 'Area'; //value for the Lookup attribute in METADATA
$value = '2'; //the needle we are looking for

$find = "\t" . $value . "\t";
/* 
 adding tabs before and after the $value may be flawed, although each 
 column starts with a tab followed by text, only the last column has 
 the an extra tab. Not sure this would work properly if the column 
 was in the middle, or if the ELEMENT happened to have multiple $value 
 in the same element. */

   /* 
     Search for a specific METADATA element with matching 
     Resource and Lookup attributes */


$node = $this->xml->xpath(
             "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
            ."/DATA[contains(., '{$find}')]"
        ); 

    $x = explode("\t", (string) trim($node[0])); //convert the tab delimited 
                                                 //string to an array

    echo print_r($x,true); //this shows what the array would look like, 
                           //with out the trim there would be empty 
                           //first and last array elements

Array
(
    [0] => Academy
    [1] => Aca
    [2] => 2
)


    $LongValue = $x[0]; //assuming the LongValue is in the first column

    echo $LongValue; //this shows the LongValue retuned
    Academy

感谢您的帮助!

更新...发布后,想出了这个...

Update... After posting, came up with this…

//get index of 'Values' column from COLUMNS element
$node = $this->xml->xpath(
             "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
            ."/COLUMNS");
if($node) {

    //array of column names
    $columns = explode("\t", strtolower((string) trim($node[0]))); 

    $long_value_index = array_search('longvalue', $columns);

} else {
    echo 'not found';
    exit;
}

现在使用 $index 这可以从适当的偏移量返回 LongValue

Now with the $index this could return the LongValue from the proper offset

$LongValue = $x[$long_value_index]; 

任何想法

推荐答案

你已经走得很远了,你已经很好地分析了你需要处理的数据.另外你说你想解析数据的方式对我来说看起来很好.唯一可能会有所改善的事情是您要注意不要一次做太多事情.

You are already quite far and you have well analyzed the data you need to deal with. Also how you say you want to parse the data looks very well for me. The only thing that probably can be a little improved is that you take care to not do too much at once.

这样做的一种方法是将问题分成更小的问题.我将向您展示如何将代码放入多个函数和方法中.但是让我们从单个函数开始,这是逐步进行的,因此您可以尝试按照示例来构建它.

One way to do so is to divide the problem(s) into smaller ones. I will show you how that works putting code into multiple functions and methods. But lets start with a single function, this goes step-by-step, so you can try to follow the examples to build this up.

在 PHP 中分离问题的一种方法是使用函数.比如写一个在XML文档中搜索的函数,这样代码看起来更好看:

One way to separate problems in PHP is to use functions. For example, write one function to search in the XML document, this makes the code look a better and more speaking:

/**
 * search metadata element
 *
 *
 * @param SimpleXMLElement $xml
 * @param string           $resource metadata attribute
 * @param string           $lookup   metadata attribute
 * @param string           $value    search value
 *
 * @return SimpleXMLElement
 */
function metadata_search(SimpleXMLElement $xml, $resource, $lookup, $value) {

    $xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
            ."/DATA[contains(., '{$find}')]";

    list($element)= $xml->xpath($xpath);

    return $element;
}

所以现在您可以轻松搜索文档,命名并记录参数.所需要的只是调用函数并获取返回值:

So now you can easily search the document, the parameters are named and documented. All that it is needed is to call the function and get the return value:

$data = metadata_search($xml, 'Property', 'Area', 2);

这可能不是完美的功能,但它已经是一个例子.在函数旁边,您还可以创建对象.对象是具有自己上下文的函数.这就是为什么这些函数被称为方法的原因,它们属于对象.类似于 SimpleXMLElementxpath() 方法.

This might not be the perfect function, but it is an example already. Next to functions you can also create objects. Objects are functions that have their own context. That's why those functions are called methods then, they belong to the object. Like the xpath() method of the SimpleXMLElement.

如果你看到上面的函数,第一个参数是 $xml 对象.然后执行 xpath 方法.最后,这个函数真正做的是根据输入变量创建和运行 xpath 查询.

If you see the function above, the first parameter is the $xml object. On that the xpath method is then executed. In the end what this function really does is creating and running the xpath query based on the input variables.

如果我们可以将该函数直接引入$xml 对象,我们就不需要再将它作为第一个参数传递了.这是下一步,它通过扩展 SimpleXMLElement 来工作.我们只是添加了一个新方法来进行搜索,该方法与上面的方法几乎相同.我们还从 SimpleXMLElement 扩展,这意味着我们创建了它的子类型:这就是它已经拥有的所有内容以及您添加的新方法:

If we could bring that function directly into the $xml object, we would not need to pass that any longer as first parameter. That is the next step and it works by extending SimpleXMLElement. We just add one new method that does the search and the method is pretty much the same as above. We also extend from SimpleXMLElement which means we create a sub-type of it: That is all it has already plus that new method you add:

class MetadataElement extends SimpleXMLElement
{
    /**
     * @param string           $resource metadata attribute
     * @param string           $lookup   metadata attribute
     * @param string           $value    search value
     *
     * @return SimpleXMLElement
     */
    public function search($resource, $lookup, $value) {
        $xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
            ."/DATA[contains(., '{$value}')]";

        list($element)= $this->xpath($xpath);

        return $element;
    }
}

为了让它栩栩如生,我们需要在加载 XML 字符串时提供这个类的名称.然后可以直接调用搜索方法:

To get this to life, we need to provide the name of this class when loading the XML string. Then the search method can be called directly:

$xml  = simplexml_load_string($xmlString, 'MetadataElement');
$data = $xml->search('Property', 'Area', 2);

瞧,搜索现在是 SimpleXMLElement!

Voila, the search is now with the SimpleXMLElement!

但是如何处理这个$data?它只是一个 XML 元素,它仍然包含选项卡.

But what to do with this $data? It's just an XML element and it still contains the tabs.

更糟糕的是,上下文丢失了:这属于哪个元数据列?那是你的问题.所以我们接下来需要解决这个问题 - 但是如何解决?

Even more bad, the context is lost: To which metadata column does this belong to? That is your problem. So we need to solve this next - but how?

老实说,有很多方法可以做到这一点.我的一个想法是基于元数据元素从 XML 中创建一个表对象:

Honestly, there are many ways to do that. One Idea I had was to create a table object out of the XML based on a metadata element:

list($metadata) = $xml->xpath('//METADATA[1]');
$csv = new CsvTable($metadata);
echo $csv;

即使有很好的调试输出:

Even with nice debug output:

+---------+----------+-----+
|LongValue|ShortValue|Value|
+---------+----------+-----+
|Salado   |Sal       |5    |
+---------+----------+-----+
|Academ   |Aca       |2    |
+---------+----------+-----+
|Rogers   |Rog       |1    |
+---------+----------+-----+
|Bartlett |Bar       |4    |
+---------+----------+-----+

但是,如果您可能不熟悉编程对象,那么这在某种程度上需要大量工作,因此自行构建整个表模型可能有点多.

But that is somehow a lot of work if you're probably not fluent with programming objects so building a whole table model on it's own is maybe a bit much.

因此我有了一个想法:为什么不继续使用您已经使用的 XML 对象并稍微更改其中的 XML 以使其具有更好的格式以符合您的目的.来自:

Therefore I had the idea: Why not continue to use the XML object you already use and change the XML in there a bit to have it in a better format for your purposes. From:

<METADATA Resource="Property" Lookup="Area">
  <COLUMNS>   LongValue   ShortValue  Value   </COLUMNS>
  <DATA>  Salado  Sal 5   </DATA>

致:

<METADATA Resource="Property" Lookup="Area" transformed="1">
    <COLUMNS>   LongValue   ShortValue  Value   </COLUMNS>
    <DATA>
        <LongValue>Salado</LongValue><ShortValue>Sal</ShortValue><Value>5</Value>
    </DATA>

这不仅可以搜索特定的列名称,还可以查找数据元素中的其他值.如果搜索返回 $data 元素:

This would allow to not only search per a specific column name but also to find the other values in the data element. If the search return the $data element:

$xml  = simplexml_load_string($xmlString, 'MetadataElement');
$data = $xml->search('Property', 'Area', 5);
echo $data->Value;     # 5
echo $data->LongValue; # Salado

如果我们为元数据元素留下一个附加属性,我们可以在搜索时转换这些元素.如果找到了一些数据并且元素尚未转换,则将对其进行转换.

If we leave an additional attribute with the metadata-element we can convert these elements while we search. If some data is found and the element not yet converted, it will be converted.

因为我们都在内部搜索方法中这样做,所以使用搜索方法的代码不能有太大变化(如果不是甚至根本没有 - 取决于你的详细需求,我可能会还没有完全掌握这些,但我认为你明白了).所以让我们把它付诸实践.因为我们不想一下子做这一切,所以我们创建了多个新方法来:

Because we all do this inside the search method, the code using the search method must not change much (if not even not at all - depends a bit on the detailed needs you have, I might not have fully grasped those, but I think you get the idea). So let's put this to work. Because we don't want to do this all at once, we create multiple new methods to:

  1. 转换元数据元素
  2. 在原始元素中搜索(这段代码我们已经有了,我们只是移动它)

在此过程中,我们还将创建我们认为有用的方法,您会注意到这也是您已经编写的部分代码(例如在 search() 中),它现在只是放在里面$xml 对象 - 它更自然地属于这里.

Along the way we will also create methods we deem helpful, you will notice that this is also partly code that you have written already (like in search()), it is just placed now inside the $xml object - where it more naturally belongs.

最后这些新方法会被放在现有的search()方法中.

Then finally these new methods will be put together in the existing search() method.

所以首先,我们创建了一个辅助方法来将这个标签行解析为一个数组.它基本上是你的代码,你不需要在 trim 前面转换字符串,这是唯一的区别.因为这个函数只在内部需要,所以我们把它设为私有:

So first of all, we create a helper method to parse this tabbed line into an array. It's basically your code, you do not need the string cast in front of trim, that is the only difference. Because this function is only needed inside, we make it private:

private function asExplodedString() {
    return explode("\t", trim($this));
}

从它的名字就可以看出它的作用.它返回本身的tab-exploded数组.如果你还记得,我们在 $xml 里面,所以现在每个 xml 元素都有这个方法.如果你还没有完全理解这一点,请继续,你可以在下面看到它是如何工作的,我们只添加一个方法作为帮助:

By its name it is clear what it does. It gives back the tab-exploded array of itself. If you remember, we are inside $xml so now every xml-element has this method. If you do not full understand this yet, just go on, you can see how it works right below, we only add one more method as a helper:

public function getParent() {
    list($parent) = $this->xpath('..') + array(0 => NULL);
    return $parent;
}

这个函数允许我们检索一个元素的父元素.这很有用,因为如果我们找到一个数据元素,我们希望转换作为父元素的元数据元素.因为这个函数是通用的,所以我选择让它公开.所以它也可以在外部代码中使用.它解决了一个常见问题,因此不像爆炸方法那样具有特定的性质.

This function allows us to retrieve the parent element of an element. This is useful because if we find a data element we want to transform the metadata element which is the parent. And because this function is of general use, I have chosen to make it public. So it can be used also in outside code. It solves a common problem and therefore is not of that specific nature like the explode method.

所以现在我们要转换一个元数据元素.虽然上面这两个辅助方法需要更多的代码行,但多亏了这些东西不会很复杂.

So now we want to transform a metadata element. It will take some more lines of code as these two helper methods above though, but thanks to those things will not be complicated.

我们只是假设调用此方法的元素是元数据元素.我们不会在此处添加检查以保持代码较小.由于这又是一个私有函数,我们甚至不需要检查:如果在错误的元素上调用了此方法,则错误发生在类本身内部 - 而不是来自外部代码.这也是我在这里使用私有方法的一个很好的例子,它更具体.

We just assume that the element this method is called on is the metadata element. We do not add checks here to keep the code small. As this is a private function again, we even do not need to check: If this method is invoked on the wrong element, the fault had been done inside the class itself - not from outside code. This is also a nice example why I use private methods here, it's much more specific.

所以我们现在对元数据元素所做的实际上非常简单:我们获取里面的列元素,分解列名称,然后我们遍历每个数据元素,也分解数据,然后清空数据-element 仅将列命名的子项添加到它.最后,我们添加一个属性来将元素标记为已转换:

So what we do now with the metadata element is actually quite simple: We fetch the column element inside, explode the column names, and then we go over each data-element, explode the data as well, then empty the data-element only to add the column-named children to it. Finally we add an attribute to mark the element as transformed:

private function transform() {
    $columns = $this->COLUMNS->asExplodedString();

    foreach ($this->DATA as $data) {
        $values  = $data->asExplodedString();
        $data[0] = ''; # set the string of the element (make <DATA></DATA> empty)
        foreach ($columns as $index => $name) {
            $data->addChild($name, $values[$index]);
        }
    }

    $this['transformed'] = 1;
}

好的.现在什么给?让我们测试一下.为此,我们修改现有的搜索函数以返回转换后的数据元素 - 通过添加一行代码:

Okay. Now what gives? Let's test this. To do that we modify the existing search function to return the transformed data element - by adding a single line of code:

public function search($resource, $lookup, $value) {
    $xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
        . "/DATA[contains(., '{$value}')]";

    list($element) = $this->xpath($xpath);

    $element->getParent()->transform();
    ###################################

    return $element;
}

然后我们将其输出为 XML:

And then we output it as XML:

$data = $xml->search('Property', 'Area', 2);
echo $data->asXML();

现在给出以下输出(美化,它通常在一行上):

This now gives the following output (beautified, it's on a single line normally):

<DATA>
  <LongValue>Academ</LongValue>
  <ShortValue>Aca</ShortValue>
  <Value>2</Value>
</DATA>

我们还要检查是否设置了新属性,以及该元数据表/块的所有其他数据元素是否也已转换:

And let's also check that the new attribute is set and all other data-elements of that metadata-table/block are transformed as well:

echo $data->getParent()->asXML();

还有输出(美化):

<METADATA Resource="Property" Lookup="Area" transformed="1">
  <COLUMNS> LongValue   ShortValue  Value   </COLUMNS>
  <DATA>
    <LongValue>Salado</LongValue>
    <ShortValue>Sal</ShortValue>
    <Value>5</Value>
  </DATA>
  ...

这表明代码按预期工作.这可能已经解决了您的问题.例如.如果您总是搜索一个数字,而其他列不包含数字,并且您只需为每个元数据块搜索一个.但是可能不会,因此需要更改搜索功能以在内部执行正确的搜索和转换.

This shows that the code works as intended. This might already solve your issue. E.g. if you always search for a number and the other columns do not contain numbers and you only need to search one per metadata block. However likely not, therefore the search function needs to be changed to perform the correct search and transform internally.

这一次,我们再次使用 $this 将方法放在具体的 XML 元素上.两种新方法:一种根据其属性获取元数据元素:

This time again we make use of the $this to put a method on the concrete XML element. Two new methhods: One to get a Metadata element based on it's attributes:

private function getMetadata($resource, $lookup) {
    $xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']";
    list($metadata) = $this->xpath($xpath);
    return $metadata;
}

搜索元数据元素的特定列:

And one to search a specific column of a metadata element:

private function searchColumn($column, $value) {
    return $this->xpath("DATA[{$column}[contains(., '{$value}')]]");
}

然后在主搜索方法中使用这两种方法.首先通过属性查找元数据元素会稍微改变它.然后将检查是否需要转换,然后通过值列进行搜索:

These two methods are then used in the main search method. It will be slightly changed by first looking up the metadata element by its attributes. Then it will be checked if the transformation is needed and then the search by the value column is done:

public function search($resource, $lookup, $value)
{
    $metadata = $this->getMetadata($resource, $lookup);
    if (!$metadata['transformed']) {
        $metadata->transform();
    }

    list($element) = $metadata->searchColumn('Value', $value);

    return $element;
}

现在终于完成了新的搜索方式.它现在只在右列中搜索,转换将即时完成:

And now the new way of searching is finally done. It now searches only in the right column and the transformation will be done on the fly:

$xml = simplexml_load_string($xmlString, 'MetadataElement');
$data = $xml->search('Property', 'Area', 2);
echo $data->LongValue, "\n"; # Academ

现在看起来不错,而且看起来好像完全易于使用!所有的复杂性都集中在 MetadataElement 中.乍一看又是怎样的?

Now that looks nice and it looks as if it is totally easy to use! All the complexity went into MetadataElement. And how does it look like at a glance?

/**
 * MetadataElement - Example for extending SimpleXMLElement
 *
 * @link http://stackoverflow.com/q/16281205/367456
 */
class MetadataElement extends SimpleXMLElement
{
    /**
     * @param string $resource metadata attribute
     * @param string $lookup   metadata attribute
     * @param string $value    search value
     *
     * @return SimpleXMLElement
     */
    public function search($resource, $lookup, $value)
    {
        $metadata = $this->getMetadata($resource, $lookup);
        if (!$metadata['transformed']) {
            $metadata->transform();
        }

        list($element) = $metadata->searchColumn('Value', $value);

        return $element;
    }

    private function getMetadata($resource, $lookup) {
        $xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']";
        list($metadata) = $this->xpath($xpath);
        return $metadata;
    }

    private function searchColumn($column, $value) {
        return $this->xpath("DATA[{$column}[contains(., '{$value}')]]");
    }

    private function asExplodedString() {
        return explode("\t", trim($this));
    }

    public function getParent() {
        list($parent) = $this->xpath('..') + array(0 => NULL);
        return $parent;
    }

    private function transform() {
        $columns = $this->COLUMNS->asExplodedString();

        foreach ($this->DATA as $data) {
            $values  = $data->asExplodedString();
            $data[0] = ''; # set the string of the element (make <DATA></DATA> empty)
            foreach ($columns as $index => $name) {
                $data->addChild($name, $values[$index]);
            }
        }

        $this['transformed'] = 1;
    }
}

也不错.很多小方法,只有几行代码,就是(相对)易于遵循!

Not too bad either. Many small methods that just have some little lines of code, that is (rel.) easy to follow!

所以我希望这能给一些启发,我知道这是一篇值得阅读的文章.玩得开心!

So I hope this gives some inspiration, I know this was a quite some text to read. Have fun!

这篇关于PHP simplexml xpath 在包含制表符分隔文本的 ELEMENT 中搜索值吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆