PHP:使用 xpath 从 html 表中提取多个数据 [英] PHP: extract multiple data from html table with xpath

查看:40
本文介绍了PHP:使用 xpath 从 html 表中提取多个数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须从 HTML 页面读取信息并将其传输到多个数组以进行进一步处理.我使用 xpath 的方法并不成功,以至于我无法访问我想要的数据.

I have to read information from an HTML page and transfer it to multiple arrays for further processing. My approaches with xpath have not been so successful that I had access to the data I wanted.

正文部分包含一个具有不同行数的表格,如下例所示:

The body section contains a table with different numbers of lines, as in the following example:

...
</tr>
<tr>
    <td class="name" title="43PUS6551" datalabel="43PUS6551">
        <span>43PUS6551</span>
    </td>
    <td datalabel="Internetnutzung" class="usage">eingeschränkt</td>
    <td datalabel="Onlinezeit heute" class="bar time">
        <span title="03:20 von 14:00 Stunden">
            <span style="width:23.81%;"/>
        </span>
    </td>
    <td datalabel="Zugangsprofil" class="profile">
        <select name="profile:user6418">
            <option value="filtprof1">Standard</option>
            <option value="filtprof3">Unbeschränkt</option>
            <option value="filtprof4">Gesperrt</option>
            <option value="filtprof5334">Network</option>
            <option value="filtprof5333" selected="selected">Stream</option>
            <option value="filtprof4526">X-Box_One</option>
        </select>
    </td>
    <td datalabel="" class="btncolumn">
        <button type="submit" name="edit" id="uiEdit:user6418" value="filtprof5333" class="icon edit" title="Bearbeiten"/>
    </td>
</tr>
<tr>
...

我需要一个数组,它包含第 2 行的 title 属性作为键,并从 或使用 following-sibling.但是我太愚蠢了,无法正确使用 xpath 合成器.

But now I'm struggling to get the designated value. I tried different ways: upwards with parent and back down to the node <select> or with following-sibling. But I'm too stupid to use the xpath synthas properly.

如果我做到了,我需要一个数组,其中包含 <select> 部分(第 12 行)中的属性 name 作为键和属性 来自 部分的 value 也被 selected 作为值.

If I accomplished that, I need an array which contains the attribute name from the <select> section (line 12) as key and the attribute value from the <option> section which is also selcted as value.

$filters = [
    'profile:user6418' => 'filtprof5333'
    …
]

最后,我需要一个包含来自 部分的数据的数组(出现在每一行中):

Finally, I need one array containing the data from the <option> section (appears in every row):

$profiles = [
    'Standard' => 'filtprof1',
    'Unbeschränkt' => 'filtprof3,
    …
    'X-Box-One' => 'filtprof4526',
]

对正确的 xpath-hints 的任何帮助将不胜感激

Any help for propper xpath-hints will be appreciated

推荐答案

以下是我想要的功能:

    // convert html response into SimpleXML
    $dom = new DOMDocument();
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($response);
    $xmlSite = simplexml_import_dom($dom);

    // initialize processing values
    $devices = [];
    $options = [];
    $filters = [];

    // parse SimpleXML with xpath to get current data
    $rows = $xmlSite->xpath('//tr/td[@title=@datalabel]');  // these are the rows with assignments of devices to filters
    foreach ($rows as $row) {
        $key = utf8_decode((string)$row->attributes()['title']);    // name (label) of the devices
        if (preg_match('/Alle /', $key)) {                          // skip standard settings
            continue;
        }
        $select = $row->xpath('parent::*//select[@name]');  // find the line with the currently assigned ID for the device
        $value = (string)$select[0]->attributes()['name'];  // get the current ID ('profile:user*' or 'profile:landevice*')
        $devices[$key] = $value;

        $options = $select[0]->xpath('option');             // the defined filters (dropdown in each row)
        foreach ($options as $option) {
            $profiles[utf8_decode((string)$option)] = (string)$option->attributes()['value'];   // get label and ID of filters
            if (isset($option->attributes()['selected'])) {     // determine the filter currently assigned to the device
                $filters[$value] = (string)$option->attributes()['value'];  // get device (ID) and filter (ID)
            }
        }
    }

这篇关于PHP:使用 xpath 从 html 表中提取多个数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆